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Preface 



This book contains a selection of papers presented at the seventh CLIN (Computa- 
tional Linguistics in the Netherlands) meeting. The meeting was held on Novem- 
ber 15, 1996, at IPO, on the premises of the Eindhoven University of Technology. 

The aim of the annual CLIN meetings is to provide an opportunity for compu- 
tational linguists to report on their work. The CLIN meeting also functions as an 
informal meeting place, primarily for Dutch and Belgian computational linguists, 
but with an increasing international participation. We were especially happy that 
Stephen Pulman (SRI International and University of Cambridge) was willing to 
act as our keynote speaker. 

About 70 participants attended the meeting, the program listed 24 present- 
ations, of which 16 were submitted for inclusion in the proceedings. After the 
reviewing procedure 13 papers remained, which you will find in this book, pre- 
ceded by Pulman’s invited paper. The contents of the CLIN proceedings will also 
be made available electronically, via World Wide Web, as a subpage of CLIN’s 
Home Page: 

http : / /odur . let . rug . nl/~vannoord/ clin/clin . html 

We would like to thank all those who contributed to CLIN VII: the speakers, the 
participants, the external reviewers (Gosse Bouma, Walter Daelemans, Frank van 
Eynde, Theo Janssen, Anton Nijholt, Remko Scha and Gertjan van Noord) and 
the people of IPO’s service department. We also thank IPO and the NWO Priority 
Programme on Language and Speech Technology for sponsoring the meeting. 

CLIN VII was held in a period that IPO went through a drastic transformation 
process, which unfortunately resulted in the dissolution of the Language Group, 
to which the organizers belonged. We thank our CLIN colleagues for their moral 
support. 

The eighth CLIN meeting will be held in Nijmegen on December 12, 1997. We 
are looking forward to seeing you there. 



Eindhoven, October 1997 

Jan Landsbergen, Jan Odijk, Kees van Deemter and Gert Veldhuijzen van Zanten. 



Author Index 



Gosse Bouma 25 

Crit Cremers 41 

Walter Daelemans 207 

Maarten Hijzelendoorn 41 

Esther Klabbers 57 

Dimitra Kolliakou 69 

Wessel Kraaij 115 

Emiel Krahmer 85 

Anna Kupsc 129 

Carlos Martm-Vide 101 

Gheorghe Paun 101 

Renee Pohlmann 115 

Adam Przepiorkowski 129 

Stephen G. Pulman 1 

Mieke Rats 145 

Ineke Schuurman 161 

Mariet Theune 177 

Kees van Deemter 85 

Frank van Eynde 191 

Jorn Veenstra 207 

Jakub Zavrel 207 



Table of Contents 

Preface iii 

Author Index iv 

Invited paper: 

Stephen G. Pulman 

Conversational Games, Belief Revision and Bayesian Networks 1 

Gosse Bouma 

Valence Alternation without Lexical Rules 25 

Crit Cremers and Maarten Hijzelendoorn 

Filtering Left Dislocation Chains in Parsing Categorial Crammer 41 

Esther Klabbers 

Speech Output Generation in GoalGetter 57 

Dimitra Kolliakou 

Possessive Affixes and Complement Composition 69 

Emiel Krahmer and Kees van Deemter 

Presuppositions as Anaphors Revisited 85 

Carlos Martm-Vide and Georghe Paun 

Modeling Coordination by Means of Operations on Strings and 

on Derivation Trees 101 

Renee Pohlmann and Wessel Kraaij 

Improving the Precision of a Text Retrieval System 

with Compound Analysis 115 

Adam Przepiorkowski and Anna Kupsc 

Unbounded Negative Concord in Polish: A Lexicalist HPSG Approach 129 

Mieke Rats 

Information Update in Dutch Information Dialogues 145 

Ineke Schuurman 

ANNO: a Multi-functional Flemish Text Corpus 161 

Mariet Theune 

GoalGetter: Predicting Contrastive Accent in Data- to- Speech Generation . 177 
Frank van Eynde 

On the Notion ‘ Minor Category’ 191 

Jakub Zavrel, Walter Daelemans and Jorn Veenstra 

Resolving PP attachment Ambiguities with Memory-Based Learning 207 

Addresses of Authors 223 



v 



( 



Invited paper: 



Conversational Games, Belief Revision and 
Bayesian Networks 

Stephen G. Pulman* 



Abstract 

The paper uses a simple and abstract characterization of dialogue in terms 
of mental state changes of dialogue participants to raise three fundamental 
questions for any theory of dialogue. It goes on to discuss currently popular 
accounts of dialogue with respect to these three questions. Next, the notion of 
‘conversational game’ is revisited within a probabilistic and decision theoretic 
framework, and it is argued that such an interpretation is plausible both 
intuitively and as the basis for computational implementation. An illustrated 
sketch of a proposed implementation using Bayesian networks is described. 



Three Questions for Dialogue 

A simple, rather abstract description of a canonical dialogue is that it consists 
of a sequence of utterances with a corresponding sequence of mental states of the 
participants in the dialogue. Person A has a sequence of mental states S^i . . . S^ n+ i 
and person B also has a sequence Sbi • • • S^n+i- Connecting these two sequences 
is a third sequence, the sequence of utterances. Uai is produced by A in state 
Al, Ub 2 is produced by B in B 2 and so on. Furthermore, A’s state Sa2 and B’s 
state Sb 2 are i at least partially, determined by the utterance U^i which precedes 
them. The utterances change the mental states of the participants to the point 
where no further communication is regarded by them as necessary: the goals of the 
conversation, whatever they were, have been achieved as far as is possible. This is 
represented by the diagram in figure 1 . 

Even this simple picture reveals that there are several large questions to be 
answered in order to be in a position to build a machine capable of playing the 
part of A or B: 

(i) what are mental states? 

(ii) how do they change? 

(iii) how do utterances connect with them and change them? 

■"SRI International Cambridge Computer Science Research Centre and University of Cambridge 
Computer Laboratory. 
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Figure 1: Two-person Dialogue 



1 The BDI tradition 

Insofar as the current literature on computational models of dialogue has a received 
wisdom on the answers to these questions, it is probably that given by the ‘BDP 
model of rational agency, as described for example in Cohen, Morgan, and Pollack 
(1990). The answer to the first question is that mental states are, or can be 
modelled as, sets of sentences in some logic, expressing the Beliefs, Desires, and 
Intentions of an agent (see Cohen and Levesque (1990)). Various axioms connect 
the having of desires and intentions with the performance of actions, some of which 
are linguistic actions. A rational agent, given an initial mental state, will reason as 
to the best course of action so as to fulfil the highest priority desires. Conversation 
proceeds via the performance of these linguistic actions. Part of the reasoning 
involves a model of the mental state of the other participants, and inferences about 
what their goals and intentions might be, based on the observed linguistic acts they 
carry out. 

For a partial answer to the second question, how do mental states change, if 
mental states are modelled as sets of sentences in some logic, then it is appropriate 
to turn to the belief revision literature: e.g. Gardenfors (1988), Galliers (1990). Be- 
lief revision is modelled via the addition or subtraction of propositions (if expressed 
on closures of belief bases, i.e. the deductive closure of some set of axioms) or of 
sentences (if expressed on belief bases themselves), operations which are required 
to preserve consistency. It is in the latter sentential form in which belief revision 
has to be implemented for the purposes of computational dialogue modelling, of 
course. A simple approach to belief revision within this framework would posit two 
basic operations, given a set of sentences A representing the existing mental state, 
and a sentence a , which is some component to be added or removed as the result 
of processing an utterance. 
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Subtraction: 

If A does not entail a then A’=A; 

Else, find some f3 in A such that A — (3 does not entail a, 

and A’ = A — (3 

In many cases (3 and a will be the same, or a will follow directly from (3 perhaps 
in conjunction with some other sentences which taken alone do not entail a. Of 
course (3 may be a conjunction of several different sentences. 

Addition: 

If A does not entail ->a, then A’ = A + a\ 

Else, find some (3 such that A — (3 does not entail 

and A' = (A — (3)+a 

It is worth noting that we need not just belief revision, but also revision of de- 
sires, and intentions. Conflicting goals and incompatible intentions are drivers of 
conversational processes just as much as detection of mismatches in beliefs. It is 
also worth pointing out that the mechanisms presupposed in belief revision, like 
detection of inconsistency or conflict, are required in some form for approaches that 
do not necessarily describe themselves as doing belief revision. Any approach to 
dialogue needs to be able to tell when an answer to a question is a plausible and 
appropriate one; when two goals cannot both be simultaneously achieved; or when 
some piece of information is implied by what is mutually known and therefore need 
not be explicitly repeated. Any formal mechanism that achieves this is addressing 
the problem of belief revision. 

Let us turn now to the answer given to question (iii), how do utterances relate 
to, and change, mental states? The BDI answer to this question is essentially that 
derived from the speech act literature, as presented by Cohen and Perrault (1979) 
and Perrault and Allen (1980). Characterising an utterance as a particular type 
of speech act enables it to be related to properties of the speaker’s mental state, 
by linguistic and other conventions governing that type of act. These conventions 
(‘felicity conditions’ in the original formulation) represent necessary and sufficient 
conditions for the performance of a genuine instance of a particular kind of speech 
act as in Searle (1969), and those conditions are at least in part conditions on 
the speaker’s mental state, requiring the speaker to have the right kind of beliefs, 
desires, and intentions. Thus a hearer can make inferences about the speaker’s 
mental state once an utterance has been recognised as instantiating a particular 
kind of speech act. 

Given background axioms of ‘rational agency’ characterising the behaviour of 
an ideally cooperative rational hearer, the BDI approach also has an account of 
how an utterance can change the mental states of the participants in a dialogue. 
As an illustration of the general approach, a typical story about how a request can 
lead to a change of mental state and a consequent action on the part of a hearer 
will go something like this. We assume that the speech act conditions, and the 
rational agency axioms are characterised along roughly these lines: 
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Request Precondition: 

IF Speaker wants A 

AND Speaker believes Hearer can do A 
AND . . . etc . 

THEN Speaker requests Heaxer to do A 

Request Postcondition: 

IF Speaker requests Heaxer to do A 
AND ... etc. 

THEN Heaxer believes Speaker wants A 
Axioms of ‘rational behaviour’: 

Cooperativity : 

IF Heaxer believes Speaker wants A 
AND . . . etc . 

THEN Hearer wants A 

Desire leads to Action: 

IF X wants A 

AND X can do A 
THEN X does A 

Now a typical piece of reasoning that could lead a Speaker to make a request in 
order to achieve some desire might proceed as follows: 

Speaker requests Hearer to do A 

Hearer believes Speaker wants A (Request Postcondition) 

.-. Hearer wants to do A (Cooperativity) 

Hearer does A (Desire leads to Action) 

That is, the Speaker desires that A be done and he reasons that by issuing a request 
he will start the above chain of events that results in A being done. This reasoning 
is typically done by backward chaining from the goal state, but that is really an 
implementational issue that does not affect the logic. 

1.1 Some problems for the BDI tradition 

The BDI tradition has led to many theoretical insights into the nature and func- 
tioning of dialogue, and there are several very impressive implemented systems 
based on versions of the approach: for example, those described by Allen, Miller, 
Ringger, and Sikorski (1996) or Sadek, Ferrieux, Cozannet, Bretier, Panaget, and 
Simonin (1996). Nevertheless there are several areas where the theoretical content 
is unclear or questionable, and there are many aspects of the theory which do not 
seem likely to yield satisfactory large scale computational implementations. We 
turn now to discussion of some of these problems. 

The basic propositional attitudes countenanced by the BDI tradition are those 
from which it derives its acronym: belief, desire, and intention. However, since the 
earliest formal work on dialogue it has been recognised that many of the proposi- 
tions that correspond to utterances in a dialogue do not fall easily into these three 
categories. Hamblin (1971) pointed out (p 36ff) that many sentences correspond 
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to propositions that are not (yet, anyway) believed by the participants. He intro- 
duces the notion of a commitment, which is not necessarily a belief (though it may 
become one) but a function purely of what has been said. Speakers are generally 
committed to a statement if they make it, or agree to one made by someone else, or 
if it clearly follows from something else to which they are committed. In particular 
commitments may be later retracted but not denied. 

In the recent literature other closely related terms have been used. Traum uses 
the term ‘proposal’ in Traum and Hinkelman (1992) and the idea of propositions 
that are being ‘grounded’ but not yet agreed appears in Clark and Schaefer (1989). 
For example, in the following dialogue (from the ‘Autoroute’ corpus described by 
Moore and Browning (1992)), between a ‘wizard’ pretending to be a route-planning 
system, and a caller, the proposition ‘caller wants to go to Edwinstowe’ cannot be 
said to be a belief of the wizard until at least step 4, rather than step 2, where the 
proposition is ‘in the air’. (We assume throughout that what is happening is that 
at step 3 the wizard is not sure she has heard correctly. At step 6 the system she 
is operating has reported that there is more than one Edwinstowe). 

1. w: Where would you like to go? 

2. c: Edwinstowe 

3. w: Edwinstowe? 

4. c: Yes 

5. w: Please wait 

6. w: Is that Edwinstowe in Nottingham? 

7. c: Yes 

More recently, several authors, like Traum and Allen (1994) and Bunt (1997), 
have pointed to the need to also recognise a category of ‘obligations’ or ‘social 
commitments’ which arise from linguistic and social conventions. If someone asks 
you a question, you are, as a reasonable member of the same language community, 
thereby placed under some kind of obligation to respond. 

Many other types of phenomena that are encountered in real dialogues seem to 
resist an easy classification into one of the three propositional attitudes counten- 
anced by the approach. These include what Bunt calls ‘dialogue control’ phenom- 
ena: utterances (feedback, acknowledgements, pause-fillers, etc.) whose function is 
to maintain the dialogue and coordinate the participants, rather than to directly 
express beliefs, desires, or intentions. 

These observations do not threaten the central role of beliefs, desires and in- 
tentions, of course, but they do indicate that as an empirically adequate account 
of what actually goes on in dialogues the BDI approach needs considerable supple- 
mentation and extension. The notion of ‘mental state’ provided by the theory is 
too simple to explain everything that happens in a natural dialogue. 

Let us turn now to the question of change of mental state, and the belief revision 
framework assumed implicitly or explicitly by BDI approaches. 

The classical belief revision framework (and associated approaches such as dy- 
namic logic: Groenendijk and Stokhof (1991), Jaspars (1996)), while giving a clear 
logical theory of change of information state, present many problems when large 
scale practical implementations are contemplated. As is well known, a simple 
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method of belief revision like that sketched above is very highly non-deterministic. 
Even given such simple choices for the existing set of beliefs A and a candidate for 
addition or subtraction a as: 

A = {a, a — ► b }, a = b (Subtraction) or -»b (Addition) 

there will be a choice about which (3 to remove. Practical belief revision requires 
us to assume some priority ordering on sentences in a belief base, such that given 
several candidates for elimination, the one which is ‘cheapest’ in terms of some 
overall score will be given up. This priority ordering usually corresponds to an 
intuitive notion like ‘strength of belief’ or ‘degree of commitment’. Deciding on the 
adjustment that makes the least overall change required to preserve consistency can 
be a computationally intensive operation. Note that any such system of weighting 
is not part of the logic itself and so some separate mechanism is required to make 
sure that the weighting scheme itself observes reasonable properties. 

Implementing classical belief revision of course requires us to be able to detect 
inconsistency, and thus some kind of classical negation is necessary in our logics. It 
would be impossible to do belief revision on sets of pure Horn clauses, for example. 
But this means that we have problems, not just with efficiency, but also with 
‘logical omniscience’. If the logic is strong enough to detect inconsistencies between 
complex beliefs, it is likely also to make the contents of a belief state imply logical 
consequences of basic beliefs that are actually beyond human ability to compute. 

For both of these reasons it is desirable for an implementation also to model 
something like ‘focus of attention’ or ‘salience’ of sentences in the mental state, so 
that reasoning can be restricted to relevant subsets of sentences, and conclusions 
can be limited to those that are humanly processable. However, all the obvious 
ways of achieving this notion (e.g. limiting chains of inference to a certain depth) 
compromise completeness and (global) consistency, as discussed in Konolige (1986). 
Since these are not properties that characterise human reasoning, especially in 
dialogue, this may actually turn out to be an advantage to us, but nevertheless 
it is not easy to see how to achieve exactly the right kind of restrictions without 
unwanted negative effects. 

Lastly, but by no means least, there is the fact that the classical approach to 
belief revision requires us to axiomatise the relevant properties of the domain in 
order to be able to track what follows from what. As anyone who has ever tried 
to carry out such an exercise in knowledge representation will confirm, this is an 
exceedingly difficult undertaking, especially when classical first order logic is the 
representation language. It soon becomes obvious why all the textbook examples 
are simple blocks worlds, or equally well structured and clean domains. Anything 
else is generally just too messy and hard, and the resulting axiom set is always 
very fragile and incomplete in its coverage. 

Turning now to the third of our questions, how to connect utterances with 
mental state, we also find problems with the logical reconstruction of speech act 
theory that is needed within the BDI framework. For example, many people, not 
least the original proponents of the theory, have commented on the implausibility 
of the ‘Cooperativity’ axiom (and its analogues for other speech acts). There are 
actually two problems: firstly, it does not allow for the case where the hearer 
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might not want to cooperate, or where external circumstances may bring about 
conflicting goals if he does: see Galliers (1990). It can also be the case that a 
hearer might be cooperative in some respects but not others. To some extent this 
can be alleviated by introducing some notion of defaults (although how to square 
this with the requirements of classical belief revision is not obvious). 

Secondly, and more seriously for the interpretation of the BDI account as a 
contribution to a theory of dialogue , is the fact that these axioms are, in the theory, 
the only way of achieving ‘uptake’ of a speech act; that is, of creating a link between 
an utterance by a speaker, and subsequent modification of the hearer’s beliefs or 
intentions concerning anything other than the speaker’s mental states. In many 
respects, the original speech act theory is rather solipsistic or one-sided: it deals 
with the conditions for the successful performance of some act by a speaker, but 
has virtually nothing to say about what happens next, or in fact about anything 
outside the speaker’s head. For example, as far as speech act theory proper is 
concerned, it is largely unexplained why a request is typically met either with an 
acceptance or a refusal, or why a question is typically met with an answer rather 
than (say) a request or another question. In the speech act literature, and in the 
BDI tradition derived from it, there are no dialogue units larger than a single 
utterance: a response to a request, or an answer to a question, cannot within the 
theory be distinguished from a conversation-initiating declarative. 

Also completely unexplained, even with the appropriate axioms in place, is why 
there is a pressure on a hearer to respond somehow to an utterance even if he is 
not in a position to respond appropriately to it. Requests which are not going 
to be complied with are still acknowledged; questions that cannot or will not be 
answered still evoke some kind of explanation or diversion. Complete silence is 
not an option, although it is not easy to see how that option would conflict with 
anything in speech act theory. 



2 Responses 

There have been broadly two types of response to this problem. (Actually, only 
one is a direct response; the other is more of a parallel development that can also 
be seen as offering a solution). Traum and Allen (1994) propose the addition of 
a new mechanism to a speech act-based approach, namely ‘discourse obligations’. 
A discourse obligation is a linguistically based social convention having the effect 
that when a particular speech act is recognised by a hearer, the hearer incurs an 
obligation to respond in an appropriate way: 

Speech Event Discourse Obligation 
S request A H accept or reject A 

S ask whether P H say whether or not P 
Utterance failure H repair utterance 
etc. 

Thus we now have what might be called a BDIO model: a new propositional 
attitude is added. However, the notion of a ‘speech event’ is now much wider than 
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that of a speech act: although the latter, in their original formulation at least, e.g. 
in Searle (1969), included some acts that one might think of as dialogue control acts 
rather than as BDI related. Nevertheless, even in the most ambitious formulations, 
there was no speech act of utterance failure. 

However, while this formulation begins to describe the conventional association 
between questions and replies, requests and acknowledgements, and so on, it does 
not fully capture the nature of the more general social pressure to respond that 
is characteristic of normal dialogues. For example, in cases where a politician is 
asked an awkward question in an interview, he will usually fail to obey the specific 
question-related discourse obligation described above, but he cannot just remain 
silent. What he will typically do is talk about something else that he hopes will be 
taken as a relevant response, but which does not actually constitute an answer. It 
seems plausible that there are at least two types of obligation involved in discourse: 
those which are associated with particular speech acts or utterance types (e.g. that 
a yes/ no question demands the answer yes or no), as described by Traum and Allen, 
and those which are more general social and communicative obligations, not specific 
to particular constructs, and concerned with the maintenance of communication 
norms. 

The second line of work which can be seen as addressing this particular defect of 
speech act theory is the ‘Conversational Games’ tradition: Power (1979), Houghton 
(1986), Kowtko, Isard, and Doherty (1992), Reithinger and Maier (1996). More of 
a descriptive framework than a theory, this tradition posits a set of ‘conversational 
games’ or ‘dialogue games’ each consisting of a set of moves, where an utterance 
may realise one or more moves. The important thing is that the games encompass 
both partners in dialogue: for example, a yes/no game consists of a yes/no question 
along with its yes/no reply. Thus the conventional link between utterance type and 
response type is achieved by making the unit of discourse something that by defin- 
ition is not restricted to a single utterance. This may not be a very sophisticated 
theoretical innovation, but it at least describes the facts correctly. 

Some conversational games postulated by Kowtko, Isard, and Doherty (1992) 
on the basis of study of the Edinburgh ‘map task’ corpus are: Instruction, Con- 
firmation, Question- YN, Question- WH, Explanation, Alignment. The moves can 
be broken into two categories: 

Initiating Moves: 



Instruct 

Check 

Query-yn 

Query-wh 

Explain 

Align 



(provides instruction) 

(elicits confirmation of known information) 
(asks yes-no question for unknown information) 
(asks wh-question for unknown information) 
(Gives unelicited description) 

(Checks alignment of position in task) 



Response and feedback moves: 



Clarify 

Reply-y 

Reply-n 

Reply-wh 



(clarifies or rephrases given information) 
(responds affirmatively) 

(negatively) 

(Respond with requested information) 
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Acknowledge (acknowledge and request continuation) 

Ready (Indicates intention to begin a new game) 

In principle the framework of conversational games can easily cover those utterance 
types that do not fit happily into a pure speech act framework, recognising that the 
function of some of these is to provide information about the state of the dialogue 
(e.g. alignment - making sure both partners know where they are in the dialogue) 
and to increase the degree of confirmation about some piece of information. 

A more refined characterisation of these ‘dialogue control’ acts is given by Bunt 
(1997). He distinguishes different aspects of context: semantic, cognitive, physical, 
social, and linguistic, with different types of dialogue act for each. Dialogue acts 
are acts which change one or more aspects of the context. 

Conversational Games are a useful descriptive framework. But as a theoretical 
contribution to the understanding of dialogue they have remained somewhat weak. 
Firstly, it is not clear how they differ from the BDI framework in the way they try 
to establish a link between utterances and mental states. From the perspective of 
speech act theorists, conversational games look like a hard-wiring of some of the 
patterns of inference that they derive from first principles. 

Secondly, the theory seems very unconstrained. For example, is there a satisfact- 
ory answer to the questions of how many games there are, how they vary according 
to the type of dialogue, and what constraints there are upon possible games? These 
are the kinds of question routinely asked of every other level of linguistic formal- 
ism. For example, in the Verbmobil system as described in Reithinger and Maier 
(1996), games of much finer level of detail than in the Map Task are envisaged: 
e.g. ‘arranging a time’, or ‘confirming a date’. These are justified along exactly the 
same lines as those developed for the Map task, namely, intuitive agreement that a 
certain level of commonality exists between different utterance/context pairs. But 
it is clearly not very much further down this route before there is a distinct game 
for practically every utterance. We would therefore like some theoretical grounding 
to establish what granularity is characteristic of useful games. 

To illustrate these issues, consider the question: What distinguishes a move 
from a game? One cannot simply identify games with standardised sequences of 
moves, although this is at first sight a tempting idea (and explicitly proposed in 
Houghton (1986)). For example, one might think that a WH query game should 
consist of a WH-query move followed by a WH-reply move. But if this were the 
case then we would have to say that the WH query game in the dialogue fragment 
we saw earlier would be over after turn 2, whereas intuitively one would want to 
. say that it was only completed after the two checking games. 



Move Game 



1. w: Where would you like to go? 


query whq 


WH 


2 . c : Edwinstowe 


reply whq 




3. w: Edwinstowe? 


check 


CHK 




4. c: Yes 


clarify 






5 . w : Please wait (time management) 


align /acknowledge 




6. w: Is that Edwinstowe in Nottingham? 


query ynq/check? 


CHK 




7. c: Yes 


reply yes/clarify 







o 
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Ian Lewin has suggested (p.c.) that in general we should single out those (sequences 
of) dialogue acts that serve to change the status of propositions currently under 
discussion from ‘proposals’ to ‘agreed commitments’. The boundaries marked by 
these transitions do seem, at least in these types of dialogue, to correspond to 
natural divisions in a dialogue. Thus although there is a context change between 
each of the utterances above, and there are three games played, there is only one 
significant change to the agreed commitments of the participants. Utterances 4-7 
serve to check and ground the information introduced by 1 and 2, and so although 
they do change the linguistic and other aspects of the context, there is a good 
sense in which they have a different status. Lewin points out that it is plausible, 
for example, that to the extent that dialogues are consciously or unconsciously 
planned, the units of planning are those represented by the acquisition of agreed 
propositions rather than the units that correspond to the conversational games like 
‘checking’ or ‘acknowledgement’. It is not plausible to assume that such moves are 
planned: rather, they arise as an immediate response to the current state of the 
dialogue. 



3 Conversational Games Reconstrued 

Let us reconsider what a notion of conversational game might tell us about the 
answers to the three questions with which we began our investigation. In par- 
ticular, we will explore a somewhat different, and in some ways more traditional, 
interpretation of the notion of a ‘game’. 

We consider (task-oriented) dialogues to be a kind of game whose goal is to 
achieve the purposes of the dialogue (e.g. booking an airline ticket, planning a car 
journey) usually as quickly and economically as possible. A suitable example game 
to explain the analogy might be a card game like bridge. The players are in the 
position that a certain amount of information about the hand that the other player 
has is overtly available via the content of utterances, but the rest has to be inferred 
on the basis of bid behaviour and knowledge about cards. Some good reasoning 
or lucky guesses may lead to a speedy conclusion of the game. But a bad guess 
might put one at a disadvantage. So each move has to be made with an eye to 
its possible positive or negative effects. In formal decision theory, the effects are 
of course called ‘utilities’, and each move is calculated (if the player is rational) 
to maximise utilities. Moves are seldom made simply in response to the previous 
move by the opponent (although sometimes this is necessary, as when to move a 
king out of check) but are more often part of a longer range strategy. 

Pursuing the analogy at a more detailed level, then, our conversational game 
framework requires at least the following components: 

(i) move interpretation: when a player puts down some cards, we use that 
information to work out what other cards the player may have, or may want. 
The conversational game analogue of this to the classification of an utterance as 
a realisation of one or more conversational moves. Classifying an utterance as a 
move is making one hypothesis about the speaker’s mental state. Equally, one may 
make further hypotheses about what it is reasonable to think led to that particular 
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move being made. 

(ii) tactics: planning the next move in response both to the immediate situation 
but also the longer range strategy. In some cases the immediate situation may be 
the most important factor, as when moving a piece to avoid capture, or requesting 
clarification when an utterance has not been recognised with sufficient confidence, 
or when it presents the belief revision component with an apparent contradiction. 
But if things are going according to plan, the next move is both an appropriate 
response to the previous one, and a step forward in the overall plan. 

(iii) strategy: planning the next game or sequence of games to be played in 
order to win the dialogue. Strategy needs to be continually re-evaluated as new 
information is obtained. 

(iv) a fourth but vital component is that knowledge of the domain which sup- 
ports the various types of reasoning in (i-iii). 

As the computational underpinning of all four components we intend to ex- 
plore the use of Bayesian Networks, as developed by Pearl (1988), and described 
in Neapolitan (1990), a formalism which is becoming widely used in the AI com- 
munity for knowledge representation, causal reasoning, belief revision, and decision 
theoretic reasoning. There is little doubt that, modulo some important provisos 
below, this formalism can provide a plausible platform for component (iv) and so 
in the remainder of this section we concentrate on (i)-(iii). 

4 What is a Bayesian Network? 

Given a probability space of events, E, a ‘propositional variable’ is a function from 
E to a finite subset of E of mutually exclusive and exhaustive events. Given a 
propositional variable A, let a\...a n be the set of possible values of A. We write 
P(A = a,i) as P(ai) and an expression like P(A \ B) = P{A) is a shorthand for the 
expressions P(di \ bj) = P(di) for all i and j. Given a set of propositional variables 
A, B, C, ... we can define a joint probability distribution on them such that: 

^ ^ •P(fti) frji > •••) = 1 

ijk... 

Given a set of such variables, {Xi...A n }, the ‘marginal probability’ of any subset 
of them, say Xi,...,Xj y relative to this joint probability distribution is defined as: 

P(X u ...,X j ) = W-*«) 

A Bayesian or causal network is a set of propositional variables, associated with 
vertices in a directed acyclic graph, where there are conditional (in)dependencies 
between some of the variables, as reflected pictorially in the associated graph. More 
formally, a DAG consisting of vertices/variables V and edges E, with an associated 
joint distribution P, constitutes a Bayesian network under conditions below. 

First, given a variable v which is a member of V, let c(v) (‘causes of v’) be the set 
of v's parents, let d(v) be the set of v's descendants, and let a(v) be V - ( d(v ) (J v), 
that is, all the variables except v and v’s descendants. 
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Let W be any subset of a(v). W and vare conditionally independent given c(v), 
where P(c(i/)) 0, under three conditions: 

if P(v | c(u)) = 0 

- because nothing further (in particular W) can affect v 

if P(W | c{v)) = 0 

- because nothing further (in particular v) can affect W 

ifP(v\W[)c{v))=P(v\c{v)) 

- which can be verified by calculation 

If every subset of W is conditionally independent of v given c(v) then the DAG is 
a Bayesian network. 

The thing to notice about conditional independency is that although some in- 
dependencies will be permanent because of the configuration of the network, and 
will not be affected by instantiation of variables (i.e. when it is known which value 
of the propositional variable =1), some variables will become independent of each 
other only when some intervening variable has been instantiated. 

The conditional independencies in a network can be exploited to reduce the 
amount of computation involved in working out joint probabilities. Take for ex- 
ample, a network of the form 




Figure 2: bayesian net 

The ‘chain rule’ of probability theory tells us that the joint probability can be 
calculated from conditional probabilities thus: 

P{A,B,C,D) = P(A | B,C,D) x P{B \C,D)x P{C \ D) x P(D) 

In order to fully exploit this equivalence we must reorder the variables so as to 
reflect the structure of the DAG. 
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P(D, C, A,B) = P(D | A,C,B ) x P(C \ A,B) x P(A \ B) x P(B) 

Now we can use the structure of the DAG to determine the conditional inde- 
pendencies: A and B have no parents so they are not dependent on any other 
variable: thus P(A \ B) = P(A). Variable C is dependent only on the value of B 
so P(C | A,B) = P{C | B ). Variable D is dependent only on the value of A and 
of C (since any effect of B has to be via C) and so P(D \ C, A , B) = P(D | C, A). 

P(D, C, A, B) = P{D | A,C) x P(C \ B) x P(A) x P(B) 

For larger networks, this simplification avoids many unnecessary computations. 
Now the general version of the chain rule for Bayesian networks can be written: 

| c(vi))whereP{c{vi)) ^ 0 



This enables us to compute the joint distribution from the conditional probabilities. 
We can also compute the conditional probabilities given the joint distribution, using 
the chain rule in the other direction: 



P(A | £,C,D) = 



P{A,B,C,D) 

P{B,C,D) 



and so on. 

When a variable (usually one with no parents or no children) is instantiated, 
i.e. when we know which of its values is the observed one, the probabilities in the 
network have to be updated. This is done by propagation from the instantiated 
variable. The probability of each variable V can be calculated by combining the 
evidence for V from the nodes above it in the network, and those from below: let 
the evidence from the parents of V be E c and the evidence from the daughters of 
V be Ed . Then: 



P(V I E c ,E d ) = ramxPp'IR) 



where a is a normalising constant. 

The precise algorithm for computing these quantities and for propagating their 
effects throughout the relevant portions of the network is very complex, since a 
node may have many parents and many children and allowance has to be made for 
the mutual effect of new information on any of these. The algorithm assumes that 
networks are ‘singly connected’ i.e. that for any pair of nodes there is only one 
path that can be found between them (ignoring directions on arcs). This is not a 
limitation in principle because any multiply connected graph can be transformed 
to a singly connected one, although at some resulting computational cost. 

The original version of the algorithm can be found in Pearl (1988); a very 
detailed tutorial description can be found in Neapolitan (1990); and a simplified 
version restricted to tree-shaped networks is given in Shoham (1994). 
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One serious restriction that Bayesian networks impose is that the variables are 
propositional; i.e. they have only a finite number of atomic values. This means, 
in effect, that quantificational reasoning, or reasoning that depends on the internal 
structure of propositions, is not directly possibly. However, the networks can be 
very large: many applications use networks of tens of thousands of nodes each with 
a large number of values, and so for many practical purposes this restriction does 
not begin to bite. Propositions of the form ‘pred(A,B)’ can be modelled as a node 
‘pred’ with values in the product A x B of all relevant A and B values. Implement- 
ational devices can be used to keep this kind of thing manageable. An alternative 
is to generalise a potentially infinite number of propositions to a ‘proposition type’ 
which stands for all of them, if the differences between tokens are not important. 



5 Bayesian networks for move recognition 

When a hearer categorises an utterance as realising a conversational move there 
are presumably several factors taken into account in making this decision. Firstly, 
the linguistic form and content of the utterance is important: for example, it 
is very unusual for an utterance of ‘no’ to be interpreted as realising a ‘reply-y’ 
move, (although possible if enough intonational cues are given to signal a non-literal 
interpretation). Secondly, the previous few moves, or perhaps the recognition of the 
game currently being played has to play an important part. (In some approaches 
it is the only factor taken into account: see Reithinger and Maier (1996)). Thus 
an utterance of ‘OK’ might be interpreted as a ‘reply-y’ move if the previous move 
was a ‘query-yn’, but if the previous game has been seen to be completed it is 
more likely to be a ‘ready (for a new game)’ move. Thirdly, knowledge about the 
speaker’s mental state is relevant: if the hearer knows that the speaker should 
know, or has at least been told, P, then an utterance which looks superficially like 
a ‘query-wh’ or ‘query-yn’ move is probably more likely to be a ‘check’. If it is 
categorised as a check then that hypothesis in turn would weaken the likelihood 
that the speaker is certain of P: you don’t check things you are certain of. 

We can illustrate this with an example from the ‘Autoroute’ domain described 
in Lewin, Russell, Carter, Browning, Ponting, and Pulman (1993) and Lewin and 
Pulman (1995). In this domain a person interacts with a system to plan an auto- 
mobile route between places within the UK. The relevant parameters are start and 
end of journey, with optional information like type of car, stops on the way, whether 
to optimise for speed or distance, avoid or follow motorways etc. The system en- 
gages in a dialogue to instantiate as many of these parameters as possible and then 
sends the information to a commercial PC package (described in (NextBase 1991)) 
which calculates the optimal route. 

We can encode the observations described earlier into a network representing 
the influence of these factors on the recognition of conversational moves. Assume 
that there are only a finite number of types of proposition Pi...P n which can arise 
in our domain (which will usually be the case for the kind of simple task-oriented 
dialogues we are considering, even though there may be an infinite number of ways 
of expressing them). They will be simplified representations of the propositional 
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content of actual utterances, for example: 

destination=cambridge; no; ok; origin=what; etc. 

We will assume the variables and values described below (these are just for illus- 
tration: in reality, determining the precise form of the network can only be done 
in conjunction with a close analysis of the corpus dialogues). 

Pr: Previous-move = query-yn(PJ,reply-n,.... 

C: Content and form = positive, negative, ynq(P;),whq(P^),dcl(P;) 

K: S-knows-P = yes, no, maybe 
M: Current-move = query-yn(P;),reply-n,.... 

We also want the results of a particular move classification to feed into an updated 
model of the speaker’s current beliefs. This can be achieved by using the move 
categorisation network to provide evidence that instantiates a value of a variable in 
another network. We indicate this in the diagram below by a dotted line connection 
between node K and an independent subnetwork representing the hearer’s beliefs 
about the speaker’s beliefs. 




Figure 3: Bayesian net for move recognition 

Having decided on the structure of the network, we need to assign a priori prob- 
abilities to the various values of the variables. This should be done on the basis of 
statistics derived from annotated corpora, although in many applications estimates 
of probabilities derived from experts have proved to be quite accurate. In our ex- 
ample, we will assume that the top three nodes may have a fairly uniform initial a 
priori distribution on them, reflecting the fact that in the absence of any evidence, 
there is no previous move more likely than any other, no proposition more salient 
than any other, and no hypothesis about the other’s beliefs more detailed than 
any other. However, we can provide some conditional probabilities which express 



O 

ERJC 



22 



16 



Conversational Games, Belief Revision and Bayesian Networks 



a priori dependencies between particular values of M and its parents, e.g. 

P(M=query-yn(Q) | C=ynq(Q) ,K=yes , . . . ) = very low 

The probability that a yes-no question about Q realises a ‘query-yn’ move when 
the other is believed to already know the answer to Q is very low. You don’t ask 
questions about things you already know. (Notice that there is a clear connection 
here with the notion of preconditions for speech acts. The analogous link between 
preconditions and moves is reflected in the assignment of probabilities). 

P (M=query-yn(Q) |C=ynq(Q) ,K=no, . . .) = high 

A yes-no question is more likely to realise a query move if the speaker is believed 
not to know the answer already. 



P(M=check(Q) | C=ynq(Q) ,K=no, . . . ) = low 

P(M=check(Q) |C=ynq(Q) ,K=maybe, . . . ) = a bit higher 

P(M=check(Q) |C=ynq(Q) , K=maybe ,Pr=reply-wh(R) ) = pretty high 

. . . etc . 

A yes-no question is most likely to be expressing a checking move if the speaker 
may not know the answer and the previous move was a reply to a question. 

P(M=ready |Pr=query~yn,C=ok, . . .) = very low 

The probability that ‘yes’, or ‘ok’ realises a ready move when the previous move 
was a query is rather low. 

P(M=ready I Pr=reply-n,C=ok, . . . ) = quite high 

The probability that ‘yes\ or ‘ok’ realises a ready move when the previous move 
was one which can close a game is quite high. 

On the assumption that we have a complete and plausible set of probabilities 
like this we can give a hypothetical illustration of how such a network might be 
used in the first few turns of our illustrative sequence. 

The basic cycle (from the point of view of one person, here the user) is: 

1. instantiate C (and Pr and K - from the speaker’s belief network- if 
possible), update probabilities. 

2 find the value of ‘move’ that maximises P(M = move \ Pr,C,K) 

3. instantiate M to ‘move’, propagate revised probabilities. 

4. find value of ‘k’ that maximises P(K = k | Pr,C,M), and feed into 
speaker’s beliefs sub-network. 

5. re-initialise main network, and go to 1. 




23 



Pulman 



17 



Of course, we also need to provide for the user making their own move, and 
updating other networks as well. 

We can illustrate with our earlier example dialogue: 

1. w: Where would you like to go? 

We assume for illustration that this is the opening move and so Pr is not instanti- 
ated. C is instantiated as ‘whq(destination)’. K is not yet instantiated. We will fur- 
ther assume that given the a priori probabilities the most likely move for for a wh- 
question under these circumstances is a wh-query, and so that is the answer at step 

2. We now set the value of M to ‘wh-query’, and propagate the resulting probability 
changes. Given our estimated conditional probabilities and the new instantiated 
nodes the value for k that maximises P(K — k | Pr, C = whq , M = wh — query) 
will be ‘no’, and so the proposition that, at that stage in the dialogue, the speaker 
does not know the destination is added to the record of beliefs built up in the 
subnetwork. 

The user then plans and executes his own move (exactly how this is done we 
will return to below): 

2. c: Edwinstowe (reply-whq) 

Back comes the reply: 

3. w: Edwinstowe? 

This time round the cycle, Pr=‘reply-whq’, C=ynq(destination=Edwinstowe), and 
K is ‘yes’ for the proposition ‘destination=Edwinstowe’. This latter value we as- 
sume to be a consequence of the user’s previous reply: normally you would expect 
someone to know something they have just been told. This information can be 
recovered from the ‘speaker’s belief’ subnetwork. 

We assume that given the probabilities above, the most likely move assignment 
for this yes- no question is as a ‘check’ rather than a genuine question. So we now 
instantiate M for this value. Recalculating probabilities the most probable value 
for K with respect to these instantiations should now be ‘maybe’ rather than ‘yes’ 
and this can be used to update the record of speaker beliefs being built up. 

5.1 Choosing the next move 

Bayesian networks can be extended so as to represent information not only about 
probabilities, but also utilities attached to the consequences of particular actions. 
This enables the integration of reasoning about the probability of an effect along 
with the desirability of that effect. Utilities can be combined and propagated by 
essentially the same algorithm as is used for probabilities (Neapolitan (1990) esp. 
Chapter 9). 

Bayesian networks extended in this way are usually referred to as ‘causal influ- 
ence diagrams’. To the set of nodes representing propositional variables we add one 
or more ‘decision’ nodes, representing a choice about whether or not to perform 
an action, and exactly one ‘value’ node, where all the utilities associated with the 
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different actions are represented. If there is more than one decision node, later ones 
must be dependent on earlier ones. 

As an illustration, we will take a network representing the decision whether to 
start a new game, or to check the previous move. There are consequences associated 
with these choices, and also there are several factors which we want to influence 
the choice that is made. The consequences of choosing to check will typically 
be that the overall dialogue will take longer. However, there is a lesser risk of a 
misunderstanding or an error causing problems later on. Going on to a new game 
will typically speed up the dialogue, but if the previous piece of information has not 
been properly ‘grounded’ then it may turn out to be insufficient to proceed at some 
later stage. For example, in our illustration, the wizard might have got the wrong 
‘Edwinstowe’, leading to either an inaccurate route, or a later repeat of most of 
the dialogue. If speed is important, it might be preferable to move to a new game 
as soon as possible provided there is reasonable confidence that understanding has 
been achieved, whereas if accuracy was preferred to speed, frequent checking moves 
and a more cautious dialogue style would be called for. 

The decisions have consequences, but the consequences might also be dependent 
on other causal factors. For example, if the environment is a noisy one, or the 
speech recogniser is unreliable, it may be that frequent checking will lead to a 
better accuracy/speed ratio than a less cautious strategy. Thus a decision will be a 
calculation based on the likelihood of the effects given the prevailing circumstances, 
and the utilities associated with those different possible outcomes. 

We can illustrate this with the following partial network for deciding which 
move to make next. In this network we have to choose to check the last move, or 
start a new game. The causal consequences of this decision are represented by a 
‘speed’ node, saying whether the dialogue is likely to be completed quickly or not, 
and an ‘accurate’ node, saying how likely it is that the route given is actually the 
one asked for. 




Figure 4: Bayesian Net for Move Choice 

The round nodes are propositional variables, as before. (In the ‘causal influence 
diagram’ literature they are referred to as ‘chance’ nodes). The square one is a 
decision node, representing the choice of possible actions. Although in our example 
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this is not the case, chance nodes can have arcs to decision nodes. 

The value node is represented as a diamond. The value node can be regarded 
as a propositional variable which contains one utility value for each possible com- 
bination of its parent nodes. These utilities can be computed from those assigned 
to the parents, or assigned directly. (The fact that there is only one such node 
makes the DAG look as if is no longer singly connected. But provided that the 
subgraph consisting of the chance nodes remains singly connected that does not 
matter, since the value node does not affect any of the probabilities). 

We assign probabilities to chance nodes as before. The probabilities depend 
both on parent chance nodes, and on whether a particular decision is taken. Thus, 
for example, the assignment of probabilities to the ‘speed’ node will depend on 
what decision was taken, and on how reliable the communication channel is. The 
precise values we assign to these probabilities do not matter for the sake of the illus- 
tration, but we would want the probabilities and the utilities to obey the following 
constraints: 

P(fast | check, noisy) > P (fast | check, "'noisy) 

A check is more likely to lead to an overall speedup in a noisy environment. 

P (accurate | check, noisy) > P (accurate I newgame , noisy) A check is more 
likely to maximise accuracy in a noisy environment than moving on to a new game. 

U(f ast , accurate) > .... > U(slow, "accurate) We prefer fast, accurate 
dialogues. Slow inaccurate ones are of course the worst of all worlds. 

Given a network like this with utilities and probabilities assigned, we can cal- 
culate for any action its expected utility with respect to the current instantiations 
of chance nodes. Let a be an action (i.e. a value of the decision node), and let the 
the current instantiations of chance nodes be represented by e (^evidence). The 
value node will describe the utility of the causal effects Ci... n of each action, where 
the notation for such a utility measure is u(ci): 

U(d) = II «( C ») X P (°i I a > e ) 

i 

Now we choose the action that has the maximum overall utility under the circum- 
stances, execute the conversational move corresponding to it, and update variables, 
etc. We would hope that in our example scenario, given the circumstance that the 
reliability of the communication channel is low, and that the utility that is to be 
maximised is accuracy, then the decision that would score the highest would be to 
do a checking move rather than begin a new game. 

5.2 Higher level planning 

So far we have seen how it is possible to recognise utterances as realising particular 
conversational moves, and how to select the maximally useful next move, while 
updating and combining information of several different sorts. Using Bayesian 
networks, augmented with utility calculations, offers the promise of being able 
to model locally rational conversational behaviour in a way that has not so far 
proved possible in practice on a large scale for the traditional BDI-based systems. 
However, while a system based upon the components we have sketched so far would 



20 



Conversational Games, Belief Revision and Bayesian Networks 



be a satisfactory ‘reactive’ system, we have not yet shown how to reproduce the 
higher levels of strategic planning that are one of the strong points of the traditional 
architectures. 

However, at least as far as relatively simple task-oriented dialogues of the 
Autoroute, Verbmobil, or ATIS types are concerned, it seems quite possible to 
extend this scheme to completely replace the traditional types of planning that 
most dialogue systems rely on for their overall strategy. The analogy here is with 
the use of decision networks in expert systems, particularly medical diagnosis sys- 
tems. Here the diagnosis does not necessarily proceed by going through some fixed 
sequence of questions; rather, the most informative next question is chosen dynam- 
ically by testing to see which propositional variable it would be most useful to know 
the value of. For example, knowing the age of a patient is a very important piece 
of information, even if not directly relevant to a diagnosis. If the patient is a child, 
questions about level of alcohol intake are unlikely, even in these times, to yield 
much diagnostically relevant information. Thus the utility of asking a question 
about age may be high in terms of speedy diagnosis, even though the answer itself 
may not be directly relevant. 

In the case of our Autoroute domain, we might have variables corresponding 
to the main parameters of an Autoroute query: start, destination, car type, etc. 
It is difficult to think of an assignment of utilities that is not rather trivial: for 
example, we clearly need to know the start and the destination, and so the utilities 
associated with those variables should be higher than those of e.g. car type. Also, of 
course, the utility of asking questions about the values of variables that are already 
instantiated is likely to be very low. However, we might complicate the picture 
by making the utility of some variables dependent on the values of others: for 
example, if we know that the user has a fast car, then it is probably less important 
to ask whether he is interested in a scenic route for his journey. If the user wants 
to avoid motorways, then he is probably not interested in the fastest as opposed 
to the shortest journey. 

Given a decision network having the general form of those above, and encoding 
these specific dependencies and utilities, it is possible to decide which propositional 
variable should be sampled next by the following means. 

1. Given a variable with m values: then for action a, evidence e, causal 

effects Ci... n of a, we can calculate the utility of an action with respect to the value 
of a variable by the following expression: 

U(a | Vi) = JJ u(Cj | a, Vi) x P(C, | a,e,Vi) 

j=l..n 

This expression is related to that used earlier for calculating the utility of a move: 
the difference is that there, the relevant variable, V, was assumed to be instantiated 
already. 

2. Now we can define the utility for each value of V as: 

U(Vi) = max a U (a \ V x ) 

This expression tells us the maximum utility that can theoretically be derived from 
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this value of the variable. 

3. Now the overall utility of querying V can be computed by summing the product 
of the utilities and the likelihood of realising them under the current set of evidential 
instantiations: 



n P(V l \e)xU(V i ) 

i=l... m 

Performing these computations for all uninstantiated variables will allow the 
most useful one to be questioned next: the variable that has the highest overall 
possible utility in the current circumstances is a rational choice for the subject of 
the next question. Of course, in a large network these calculations might be rather 
expensive: a practical method might involve some kind of stochastic sampling of 
variables rather than an exhaustive comparison. 



Related Work 

The notion of ‘language game’, ‘conversational game’ or ‘dialogue game’ has a long 
history in 20th century philosophy of language, starting with Wittgenstein. Games 
interpreted in a decision-theoretic way have also been used within philosophy of 
language, notably by Hintikka, although within computational linguistics this line 
of enquiry is probably best known, in one version at least, through the work of 
Carlson (1983). However, the most direct inspiration for the approach described 
here is a paper by Gamback, Rayner, and Pell (1991), in which they describe 
a hybrid rule-based/neural network approach to the pragmatics micro- world of 
bidding in bridge, in which bids are seen as various kinds of simple speech act. 

Bayesian Networks have been used in natural language processing for story 
understanding (see, for example, Charniak and Goldman (1991)) and word-sense 
disambiguation. They have also been used by Araki, Kawahara, and Doshita (1995) 
for dialogue understanding, although in a somewhat different way than envisaged 
here. The system they describe uses two networks: one ‘Conversational Space’ 
network is responsible for hypothesising the interpretation of an utterance and the 
associated speaker intention. It combines syntactic, semantic, and discourse struc- 
ture information into a single network, which is constructed dynamically for each 
new utterance. The second network (‘Problem Solving Space’) encodes a model 
of the task domain and is responsible for plan recognition, and for selecting the 
appropriate type of response. Other mechanisms (e.g. utterance type trigrams) are 
also used, and ‘mental state’ is modelled separately, apparently not by a Bayesian 
network. Explicit utilities and the framework of causal influence diagrams are not 
used. 



Conclusions 

We began with three questions that should be answered by any satisfactory com- 
putational theory of dialogue. It is worth spelling out the kinds of answers that 
are given to these questions by the framework we have sketched. 
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(i) what are mental states? - in the Bayesian network approach, mental states 
are represented by sets of propositions linked by causal (or logical) relations, with 
a probability distribution on them that respects these causal relationships. There 
is a straightforward interpretation of these networks as networks of beliefs, and 
indeed that is how they were originally envisaged in Pearl (1988). When Bayesian 
networks are augmented with the apparatus of decision and value nodes, and util- 
ities, then it is plausible to think of them as modelling some aspects of desire and 
and possibly intention, although the correspondence is not exact. This type of 
Bayesian reasoning is less powerful than that assumed in classical belief revision or 
associated frameworks like dynamic logic. Quantificational reasoning, beliefs about 
beliefs, etc. can only be handled to the extent that they can be ‘compiled out’ to a 
propositional format. However, classical BDI implementations have not been able 
to actually make use of this extra power on a large scale yet and so it remains to 
be seen whether this is a serious practical constraint. 

(ii) how do they change? - states change by the instantiation of nodes represent- 
ing new evidence, and the consequent updating of probabilities. Conflict between 
beliefs or intentions in the presence of new input is not modelled explicitly, but 
can be associated with large differences between a priori probabilities and val- 
ues derived from new evidence. Evidence from multiple sources can be combined 
unproblematically. 

(iii) how do utterances connect with them and change them? - the connection 
between utterance types and mental states is conventionalised via conversational 
games, and this conventional connection is encoded in the structure of the relevant 
networks for move recognition and response. Many of the insights of speech act 
theory and the BDI tradition are retained and encoded in this way, although their 
interpretation is now partly probabilistic rather than strictly logical. 

Clearly, there is great deal of work to be done before the preceding ideas can 
be implemented and tested in detail. However, we regard this as a promising 
perspective from which to approach the problem of building dialogue understanding 
systems. The Bayesian network architecture seems to provide the right combination 
of rule based and statistical methods. We can retain what is intuitively correct 
about the BDI tradition, while overcoming the difficulties and fragilities associated 
with strictly axiomatic systems. 

One obvious question of course, is: where do the networks and their associated 
probabilities come from? Although it is possible in principle to learn the structure 
of a Bayesian net from examples, we feel that it is more productive at least in the 
short term to think of their basic structure as reflecting (corpus-guided) linguistic 
descriptions of conversational game and move structure. However, the probabilities 
associated with the nodes in a network should reflect observed properties in a 
relevant corpus, and it is quite plausible to think of these as being automatically 
trained from an annotated corpus. 
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Valence Alternation without Lexical Rules 



Gosse Bouma* 



Abstract 

Valence changing lexical rules are a problematic component of constraint- 
based grammar formalisms. Lexical rules of this type are procedural, require 
defaults, and may easily lead to spurious ambiguity. Relational constraints 
can be used to eliminate such rules. The relational approach does not require 
defaults, is declarative, avoids spurious ambiguity, and can be an integrated 
part of a hierarchically structured lexicon. This is illustrated below for the 
complement extraction and adjunct introduction lexical rules of HPSG. We 
argue that, apart from the technical benefits mentioned above, the relational 
approach is linguistically superior, in that it offers a uniform account of com- 
plement and adjunct extraction. Furthermore, it eliminates the spurious am- 
biguity that may arise in grammars which include complement inheritance 
verbs as well as a lexicalist account of complement extraction. 



Introduction 

Recent work in HPSG has argued for lexicalist approaches to complement extraction 
(Sag 1995), adjunct selection (Miller 1992; Iida, Manning, O’Neill, and Sag 1994; 
Manning, Sag, and Iida 1996), and clitic climbing (Sag and Miller, to appear). Lex- 
icalist accounts treat these phenomena as valence variation. That is, complement 
extraction requires that each head selecting for an extractable complement C has 
a counterpart which does not select for C but instead includes C in its SLASH-set. 
Similarly, lexicalist adjunct selection requires that heads may include (an arbitrary 
number of) adjuncts on their COMPS-list. Lexicalist accounts of clitic climbing, 
finally, require not only that lexical heads may realize some of their complements 
as phonological clitics, but also that these heads have a COMPS-list which is the 
append of the list of elements the head subcategorizes for and the COMPS-list of one 
of these elements. That is, verbs allowing for clitic climbing must be complement 
inheritance verbs. Note that both the phonological realization of complements as 
clitics and complement inheritance lead to valence alternations. 

The proposals cited above all rely on lexical rules to account for certain system- 
atic alternations in lexical entries. The central role of lexical rules is remarkable, 
given the fact that lexical rules are often seen as more or less ad hoc , procedural, 
extensions of the formalism, whose formal status is far from resolved. 

At the same time, it has been customary in HPSG to employ relational, recursive, 
constraints. That is, in accounts of phenomena such as extraction, complement 
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inheritance, quantifier scoping, and word order, the values of list and set-valued 
features are routinely defined using relations such as member, delete, append, 
and (sequence) union. Lexical rules in particular, are often defined using such 
relations. 

Finally, the treatment of valence in recent versions of HPSG is more subtle than 
in the versions presented in Pollard and Sag (1987) and Pollard and Sag (1994, 
chapters 1-8). Following Borsley (1989), it is now customary to distinguish sub- 
jects from other complements by means of the two valence features SUB J and COMPS 
(replacing the single SUBCAT feature used before). Furthermore, valence is distin- 
guished from argument structure, represented by the feature ARG-ST. Argument 
structure contains the list of elements which a lexical sign selects for and is the 
level of representation to which the binding principles apply. While in the canon- 
ical case ARG-ST will correspond to the append of SUBJ and COMPS, this is by no 
means always true. In Manning and Sag (1995) it is observed that phenomena such 
as passive, ‘pro-drop’, and syntactic ergativity in a number of languages can be seen 
as evidence for several non-canonical relationships between valence and argument 
structure, providing evidence for a level of representation independent of valence. 
Note also that complement inheritance verbs will typically contain (inherited) ele- 
ments on COMPS that do not correspond to arguments of that verb. Lexicalized 
extraction, finally, implies that some (non-subject) elements on ARG-ST will not be 
present on COMPS, but are included in SLASH instead. 

In this paper, it is argued that the distinction between valence and argument 
structure allows valence changing lexical rules to be eliminated. Valence alterna- 
tions are captured instead by general, possibly recursive, constraints defining the 
mapping between argument structure and valence. We demonstrate this in some 
detail for complement extraction and adjunct introduction. Other valence chan- 
ging lexical rules (such as ) can in principle be replaced by constraints in a similar 
fashion. 1 

Approaches to valence variation using relational constraints have been proposed 
by, among others, Kathol (1994) and Frank (1994). The current proposal, however, 
allows recursive constraints, and thus it can account for complement extraction 
(requiring arbitrary elements on ARG-ST to be realized as gaps ) and adjunct in- 
troduction (requiring the insertion of an arbitrary number of adjuncts on ARG-ST 
(and COMPS)). Furthermore, the constraints proposed below do not require the 
introduction of additional features. Instead, all constraints apply to independently 
motivated features, leading to a tight integration of the constraint system with the 
overall architecture of HPSG. 

The connection between lexical rules and relational constraints was first noted 
in van Noord and Bouma (1994). By viewing lexical rules as relational constraints, 
delayed evaluation techniques can be used to solve the computational problems 
posed by recursive lexical rules. However, the constraints in van Noord and Bouma 
(1994) hold between full-blown lexical entries (i.e. signs), whereas below we use 



Hn Sag and Miller (to appear), for instance, an account of French clitization is presented 
which is directly compatible with (and inspired by) the approach outlined below in that it defines 
the realization of certain elements on ARG-ST sis clitics by means of a constraint on the mapping 
between arG-ST and COMPS, instead of by means of a lexical rule, as in previous proposals. 
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constraints to relate only specific features within a sign. Since all constraints apply 
to the same sign conjunctively, the issue of rule-ordering, which was solved in van 
Noord and Bouma (1994) by hard-wiring the order of rule application into the con- 
straints (see Meurers and Minnen (1995) for an alternative approach), disappears. 
Also, the need for default sharing of information between input and output of a 
lexical rule disappears. 

Below, we present an example lexicon fragment, in which both lexical inherit- 
ance and lexical rules are used. We point out a number of problematic aspects of 
these rules in a constraint-based setting. In section 2, we redefine the fragment 
by replacing lexical rules with relational constraints. We demonstrate that the 
constraint-based fragment naturally leads to an account of complement extraction 
which subsumes the possibility of adjunct extraction. In section 3, we argue that 
the kind of spurious ambiguity noted in Hinrichs and Nakazawa (1996) does not 
arise in our proposal. 



1 A lexicon fragment with lexical rules 

We present a lexicon fragment for verbs which uses inheritance, constraints, and 
lexical rules. We point out various problematic aspects of this set-up. 



The basic lexicon 

A definite clause specification for the basic lexical entries of a small lexicon fragment 
is given in fig. 1. The unary predicate basic-entry defines the set of basic lexical 
entries in the language. A basic entry is of type word , and can be a major category 
(i.e. v, n, etc.) entry satisfying the slash-amalgamation constraint (introduced 
below). A verbal major category must satisfy verbal-subcat and map-args. The 
first defines the various verbal subcategorization types, whereas the latter defines 
the mapping between argument structure and valence. 

Following Manning and Sag (1995), we assume that different verbal subcat- 
egorization types differ only in their argument structure, and that the values of 
the valence features are defined by means of a general mapping constraint. This 
is the task of the relational constraint map-args. Canonically, the first element 
on ARG-ST is the subject, while the rest is equal to COMPS (’(’ connects the head 
and tail of a list). (Alternative definitions are considered below.). By combining 
the definitions of verbal-subcat, verbal-lex, and map-args, we can for instance 
derive the following fact: 



PHON 


hates 


HEAD 


V 


ARG-ST 


( CD NP;, {2} NPj ) 


SUBJ 


(B) 


COMPS 


(ID) 


CONT 


hate! {i, j) 
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basic-entry(0 



word 

ARG-ST 0 
SLASH 0 



major([D) A slash-amalgamation(0, CD) 



major([H 



HEAD V 
ARG-ST 0 
SUBJ 0 
COMPS 0 



verbal- sub cat (0) A map-args(0, 0, 0) 



verbal- subcat ( 



PHON 0 
ARG-ST ( NPt ) ) <- 
CONT 0(z) 

verbal-lex (intrans ,0,0) 

PHON 0 
ARG-ST ( NPt,NPj ) 
CONT 0(M) 

verbal-lex (trans ,0,0) 



verbal- subcat ( 



verbal-lex(instrans,s/eeps,s/eep') 

verbal-lex(trans,/ia£es,/ia£e') 

map-args({ 0 | 0 ), ( 0), 0 ) 

si ash -amalgamation^ ), 0) 

slash-amalgamation({ [ SLASH 0 ] | 0 ), 0 W 0) <— 
slash-amalgamation(0, 0) 

Figure 1: A fragment of the basic lexicon 
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The two main features of the lexicalist approach to extraction presented in Sag 
(1995) is the elimination of the NONLOCAL FEATURE PRINCIPLE in favour of a 
lexical slash amalgamation constraint and the elimination of traces in favour of a 
lexical complement extraction rule. Slash amalgamation requires that the SLASH- 
value of a basic lexical entry is the set-union of the SLASH-values of its arguments. 
The slash-amalagamation constraint implements this by recursively traversing 
the list of elements on ARG-ST, and unioning all the SLASH values: (l±J denotes 
(non- vacuous) set-union). Slash amalgamation makes the NONLOCAL FEATURE 
PRINCIPLE superfluous, as SLASH can simply be shared between head and mother in 
phrases without a filler daughter, while SLASH is subject to rule-specific constraints 
in head-filler phrases. An example of slash amalgamation at work is given after we 
have introduced the lexical rule for extraction. 

The basic lexicon incorporates the following notion of lexical inheritance: A 
basic entry has various major category entries as its subclasses. All of these sub- 
classes must satisfy slash-amalgamation. Similarly, the verbal major category 
class has various verbal subcategorization types as subclass. All of these must 
satisfy map-args. Thus, the unary predicates in general define subclasses of the 
general class basic-entry, whereas the other predicates define constraints which 
must hold for the class in whose antecedent the predicate appears. 

Adding lexical rules 

In fig. 2, we define two lexical rules. A lexical rule defines a relationship between 
an ‘input’ and ‘output’ lexical entry. Therefore, lexical rules can be added to the 
fragment as instances of the relation lexical-rule(/n,Out). Furthermore, the set 
of lexical entries (basic or derived) is now defined by the relation entry. 

entry(Hl) <— 

basic-entry (HI) 
entry (HI) <— 

entry([o]) A lexical-rule (El, HI) 



%% complement extraction lexical rule (celr) 

LOC H] 



lexical-rule( 



COMPS H O 



SLASH { [2] } 



[comps Ejj) 



%% adjuncts lexical rule 

.[arg-st H 
lexical-rule( 



ARG-ST HI © ( ADV ) 
CONT advf( [2 ) 



CONT m 

Figure 2: Adding lexical rules 

The complement extraction lexical rule (celr) is adopted from Sag (1995), and 




n o 
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identifies an element on COMPS as a gap (i.e. subtype of synsem which satisfies 
the constraint that its SLASH-value is a singleton set, the only element of which 
is reentrant with LOC). The gap is absent in the output of the rule (O denotes 
sequence union). The interaction of this rule with slash amalgamation implies that 
the SLASH-value of the deleted element will be included in SLASH of the input (and 
output) sign. As each complement is also present on ARG-ST, and the SLASH-value 
of the sign itself is the union of the SLASH-value of its members, the instantiation 
of SLASH on one of these members will have a direct effect on SLASH. Furthermore, 
as it is assumed that information is shared by default between input and output, 
the instantiated SLASH value will be present on the output of the rule as well. 
Note, however, that the present formulation does not account for default sharing 
of information. 

Assuming the latter problem can be solved, the CELR allows the derivation of 
the following lexical entry, in which the object has been extracted: 



PHON hates 



ARG-ST 



(2) entry( 



SUBJ 

COMPS 

SLASH 



LOC 0 NPj 
SLASH { HI } 

(E) 

() 

0 w { 0 } 



^0 NPi [slash 0], 



) 



Together with slash amalgamation, and the assumption that SLASH is a head feature 
in head-valence phrases, while it gets ‘bound’ in head-filler phrases, this allows for 
the derivations of the example in fig. 3. 

The second lexical rule lexically introduces adjuncts as complements. Several 
versions of such a rule have been presented (Miller 1992; van Noord and Bouma 
1994; Manning, Sag, and Iida 1996). Here, we will assume, following Manning, 
Sag, and Iida (1996), that adjuncts are added to ARG-ST for reasons of binding and 
(adjunct) extraction (® denotes append). 

Again, this rule as given is incomplete. However, an appeal to default matching 
cannot give the correct results in this case. Note that the newly introduced adjunct 
should be added to COMPS as well. This means that the value of COMPS on input 
and output must differ, in spite of the fact that the rule does not mention them. 
Intuitively, the correct value for COMPS should follow from the map-args constraint. 
It is unclear, however, how that constraint could be made to apply at this point. For 
one thing, the interaction with complement extraction (which creates exceptions 
to the canonical mapping relation) appears to be highly problematic. That is, a 
lexical entry derived by means of complement extraction contains an element on 
ARG-ST which is not realized on COMPS (i.e. the derived entry for kiss above). 
Adding an adjunct to such an entry and reapplying map-args to the result would 
reintroduce the extracted complement. 

A similar difficulty arises in trying to account for adjunct extraction. The CELR 
relies on the fact that slash-amalgamation takes into account all elements on 
ARG-ST. However, the adjuncts rule introduces new elements on ARG-ST. This 
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S 




know 




>1 



Figure 3: Kim, we know Dana hates 



implies that extracting an adjunct by means of the CELR only has the intended 
effect if si ash -amalgamation is used to ‘recompute 1 the value of SLASH on the 
output of the rule. 



Problems for lexical rules 

We conclude this section with an overview of problematic aspects of lexical rules 
in a constraint-based setting. 

Default sharing between input and output. Lexical rules typically af- 
fect only a small part of the information in a lexical entry. To account for the 
similarity between input and output, some kind of default sharing of information 
is required. Default unification as defined in Bouma (1992), Carpenter (1992) or 
Lascarides, Briscoe, Asher, and Copestake (1996) either is not applicable to the 
typed constraint language presupposed by HPSG or to the problem of default shar- 
ing in lexical rules. Therefore, Meurers (1995) proposes a special-purpose default 
mechanism for lexical rules. Even if this problem can be solved, it is still the case 
that lexical rules are the only component of HPSG where nonmonotonicity comes 
into play. 

Interaction with Inheritance. The adjuncts lexical rule illustrates clearly 
that in some cases one wants to use inheritance of constraints to fill in missing 
information in the output, instead of default sharing. The discussion of lexical 
rules in Pollard and Sag (1987, chapter 8) also makes this assumption. No detailed 
proposals for such an interpretation of lexical rules exist, however. The conflict 
between these two interpretations of lexical rules also seems to have gone unnoticed 
in the literature. 
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Spurious ambiguity. The complement extraction lexical rule removes an 
element from COMPS. If this rule is used to delete two elements, say C\ and 
C 2 , one could either remove C\ first, or C 2 . The distinction is irrelevant for the 
outcome, however. Similarly, complement extraction removes an element, while 
adjunct introduction adds an element. Again, both orders are possible, but in 
general (the exception being cases of adjunct extraction) this will lead to the same 
result. To eliminate this kind of redundancy, one may have to introduce external 
rule ordering, reformulate the rules so that they no longer need to apply recursively 
(van Noord and Bouma 1994), or add finite state control devices (Meurers and 
Minnen 1995). 

Subsumption. Hinrichs and Nakazawa (1996) have argued that lexical rules 
should only be applied to lexical entries that are subsumed by the input conditions 
of the rule. However, not all lexical rules can be interpreted this way. Also, checking 
for subsumption appears to be incompatible with certain processing strategies. We 
return to this issue in section 3. 



2 A constraint-based alternative 

A radical solution for the problems just mentioned is to eliminate lexical rules 
and to account for valence variation by means of (recursive) constraints only. On 
the one hand, the elimination of lexical rules is a substantial simplification of the 
formalism. On the other hand, using recursive constraints for valence alternations 
is not a complication of the formalism, as recursive constraints are used in various 
other components of HPSG already. Note also that lexical rules in particular tend 
to be defined in terms of recursive constraints. That is, arguments that a system 
with lexical rules allows fewer or simpler constraints than a system without lexical 
rules can be rejected easily. 

The fragment presented in section 1 defines the relationship between argument 
structure and valence by means of a mapping constraint. Valence changing lexical 
rules, such as the complement extraction lexical rule, typically derive lexical entries 
which do not obey the ‘canonical mapping’. A more principled approach to valence 
alternations, therefore, is to take these lexical entries not as exceptions, derived by 
means of a rule, but to redefine the mapping between argument structure and 
valence, so as to allow for the ‘exceptional’ cases as well. 

The definitions in fig. 4 provide a reformulation of the definition of major verbal 
category lexical entries presented in fig. 1. The CELR and adjuncts lexical rules 
are made superfluous by a reformulation of map-args and the introduction of an 
adjuncts constraint on verbal lexical entries. 

Complement extraction 

The map-args constraint relates argument structure to the valence features SUBJ 
and COMPS, as before. The new map-non-sub j-args constraint ensures that non- 
subject arguments are either realized as complements, or as gaps. The range of 
lexical entries satisfying map-args as defined in fig. 4 therefore corresponds exactly 
to what can be derived by means of the CELR, making the latter spurious. 
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major( 



PHON 

HEAD 

ARG-ST 

SUBJ 

COMPS 

CONT 



© 

v 

12 © © 
© 

© 

a 



verbal-subcat( 



PHON 

ARG-ST 

CONT 



adjunct s(©, ©, 0) A 
map-args(EI 0 ©, ©, ©) 



© 

m 

© 



) A 



%% map- args(Arg-st,Subj, Comps) 
map-args ( (2 1 ©) , (2) , ©) <— 

map-non-subj-args(l2, ©) 



%% map-non-subj-args(Arg-st, Comps) 

map-non-sub j -args(( ),( )) 

map-non-sub j -args(( [2|©),([2|©))<— 

map-non-subj-args(H[, ©) 
map-non-subj -args(i 

map-non-subj-args(|2l, ©) 



LOC 

SLASH 



m 

{ E } 



I (D >, (U) 



%% adjuncts (Arg-st, Cont, Cont) 
adjuncts(( ), [2, [2) 
adjuncts(( ADV | (2 ) , ID ,©) <— 

adjuncts (12, aoh/(©), © ) 

Figure 4: A fragment without lexical rules 



ERIC 
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For instance, assume that the following can be derived by resolving ma j or with 
verbal-subcat and adjuncts : 



(3) raajor( 



PHON 

ARG-ST 

SUBJ 

COMPS 



hates 

m ( np<, npj ) 

m J 



map-args (Q], (H, 0) 



This can be resolved with map-args to give rise to exactly the following two results: 



(4) a. major( 



b. major( 



PHON 


hates 


ARG-ST 


( □ NP 


SUBJ 


<m> 


COMPS 


(HI) 


PHON 


hates 


ARG-ST 


( E NP. 
\ 


SUBJ 


( E > 


COMPS 


( ) 



HI NP ) 



LOC HI NP 
SLASH { HI } 



) 



Note that map-args imposes a constraint on lexical entries, and does not define 
a relation between lexical entries. Since all lexical entries, or at least all verbs, must 
satisfy map-args, and since there is no distinction between a ‘basic’ and a ‘derived’ 
lexical entry, the issue of default sharing of information disappears. 

A second advantage is that map-non-sub j-args recursively traverses ARG-ST 
and (nondeterministically) ‘decides’ for each element whether it is to be realized 
as complement or as gap. Consequently, the spurious ambiguity that was observed 
for the CELR in case multiple complements had to be extracted, does not arise in 
the constraint-based approach. 

Adding adjuncts 

In section 1, we assumed that the argument structure of a verb is fully determined 
by its subcategorization type. The adjuncts lexical rule, however, is incompatible 
with this assumption, as it appends elements to ARG-ST which the verb does not 
subcategorize for. 

The conflict can be resolved by assuming that argument structure is the append 
of two lists (E © HI in the definition of verb in fig. 4), where the value of the first 
is determined by the verbal subcategorization type, and the value of the second 
is determined by the adjuncts constraint. A similar modification is necessary 
to account for semantics (CONT). As adjunct semantics takes the basic verbal 
semantics as argument, the semantics of a verb is no longer directly determined by 
choosing a particular instance of verbal-subcat. Instead, verbal-subcat supplies 
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a basic semantic value which is taken as argument in the adjuncts constraint. The 
latter actually determines the CONT value of verb. 

The effect of these modifications can be illustrated as follows. Assume that 
major resolves with verbal-subcat to give rise to the following: 



(5) major( 



PHON hates 

ARG-ST E ( NPi, NPj ) © GO 

SUBJ E 

COMPS E 

CONT E 



adjunct s(E,/iate/(f, j),E) A map-args (E0E, E, E) 
Resolution with adjuncts, among others, gives: 





>HON 


hates 






ARG-ST 


0 ( NPj, NPj , ADV ) 




(6) major( 


SUBJ 


HI 


) - 




COMPS 


(U 






CONT 


adi/(hatei(i, j)) 




which in turn can be resolved against map-args tc 




"PHON 


hates 


1 




ARG-ST 


( 0 NPj, 0 NPy, 0 ADV ) 


(7) major( 


SUBJ 


(0) 






COMPS 


(0,0) 






CONT 


adt / (hate/(i, j)) 





) <- map-args(E, E, E) 



The newly introduced adjuncts constraint has exactly the same effect as the 
corresponding lexical rule. Since the constraint is integrated with the lexical hier- 
archy, however, the mapping between argument structure and valence is automat- 
ically accounted for. 



. Adjunct extraction 

Since the possibility of adjuncts on ARG-ST is now taken into account in the defin- 
ition of verbal lexical entries (i.e. the definition of major as given in fig. 4), 
slash- amalgam at ion will automatically apply to adjuncts on ARG-ST as well. 
Furthermore, the mapping between argument structure and valence, defined by 
map-args, will also take adjuncts into account. As si ash- amalgamation and 
map-args are the two constraints responsible for complement extraction, the pos- 
sibility of adjunct extraction is now just a special case of complement extraction. 
For instance, an alternative solution for (6) is: 
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Dana V 



COMPS ( m ) 
subj { CD > 

SLASH { E } 

hates 



m np 



Kim 



Figure 5: Intensely, we know Dana hates Kim 



(8) verb( 



PHON 


hates 


ARG-ST 


1 0 NPi, m NPj 
\ 


SUBJ 


(m) 


COMPS 


( m, m ) 


CONT 


ad^ (hate/ (i,j)) 



LOC 

SLASH 



E ADV 

{ 0 } 



) 



This allows us to derive the entry in (9) for hates , where slash- amalgamation has 
applied. An example involving this entry is given in fig. 5. 



(9) entry ( 



word 



PHON 


hates 


ARG-ST 


O^CO NPi[SL 0] 


SUBJ 


(0) 


COMPS 


(0) 


CONT 


adtf ( hate/(i,j )) 


SLASH 


0 w m w { m } 



LOC E ADV 
SL { E } 



Constraints declaratively and monotonically define the space of possible lexical 
entries, whereas lexical entries do this procedurally and nonmonotonically. There- 
fore, constraints can be integrated into a hierarchical lexicon definition in a way 
that is difficult or impossible for a system using lexical rules. Furthermore, since 
the system is declarative, procedural issues such as rule ordering and spurious am- 
biguity do not arise. Since constraints relate specific features, and not (complete) 
lexical entries, default sharing of information also is no longer necessary. 
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There are also linguistic benefits. A grammar avoiding spurious ambiguity is 
linguistically preferable over a system which does allow spurious derivations. Also, 
as shown above, the constraint-based approach can account for the possibility of 
adjunct extraction in a way that does not require any additional rules or mechan- 
isms. 



3 Complement Inheritance and Extraction 

In this section, we argue that the constraint-based approach also offers a solution 
for the spurious ambiguity problem observed in Hinrichs and Nakazawa (1996). 

Hinrichs and Nakazawa (1994) have argued that German modal and auxiliary 
verbs are complement inheritance verbs, i.e. they subcategorize for a possibly 
unsaturated lexical verbal complement, and include the complements of this verb 
in their own COMPS list. That is, a modal verb such as German konnen (to be able 
to) must be associated with the feature structure in (10). In Hinrichs and Nakazawa 
(1996) it is argued that a combination of complement inheritance and an approach 
to complement extraction based on lexical rules leads to spurious ambiguity in 
sentences containing modal or auxiliary verbs, as an inherited complement may be 
extracted not only from the COMPS-list of the verb which subcategorizes for it, but 
also from the COMPS-list of each of the verbs inheriting this complement. This is 



illustrated in (11). 










>HON 


konnen 










ARG-ST 


{ CD NP;, m Vj [comps (21 ] ) 






(10) 


SUBJ 






) 






COMPS 


ie ( ID ) 










CONT 


be-ablef(i,j) 








(11) 


m np 


m vjcOMPS ( □ ) 


vjcOMPS ( 0, m ) 




Welches Buch wird 


Peter kaufen 


konnen 




which 


book will 


Peter 


buy 


be-able 



Which book will Peter be able to buy 

The extracted element welches Buch appears on the COMPS list of two verbs, and 
thus a complement extraction lexical rule could apply to either kaufen or konnen. 

The solution proposed by Hinrichs and Nakazawa (1996) is to let lexical rules 
apply only to inputs which are subsumed by the input conditions of the rule. Since 
inherited complements are not instantiated (lexically) on COMPS of a complement 
inheritance verb, the complement extraction lexical can no longer extract inherited 
complements. This solution is not without problems, however. First, more recent 
versions of the CELR, such as the one proposed in Sag (1995), both instantiate and 
delete an element in the input. Thus, for such a rule it is crucial that the input is 
not necessarily subsumed by the input conditions of the rule. Second, subsumption 
appears to be thoroughly incompatible with processing strategies involving delayed 
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evaluation (van Noord and Bouma 1994), a technique which is relevant especially 
for the type of grammar considered in Hinrichs and Nakazawa (1996). For a sub- 
sumption test, the moment of evaluation, and thus the order in which constraints 
are evaluated, is essential. For delayed evaluation, however, it must be the case 
that order in which constraints are evaluated can be determined dynamically. 

The constraint-based analysis of complement extraction developed in section 2 
integrates the account of extraction with the mapping between argument struc- 
ture and valence. Remember that map-non-sub j-args determines for each (non- 
subject) element on ARG-ST whether it is to be realized as a complement or as a 
gap. Consequently, only arguments of a verb can be extracted. Since the extrac- 
ted np in examples such as (11) above appears on the ARG-ST of kaufen only, no 
spurious ambiguity will arise. Thus, the elimination of lexical rules also elimin- 
ates the problem observed in Hinrichs and Nakazawa (1996), without requiring a 
subsumption test. 

The introduction of complement inheritance does present another kind of chal- 
lenge, however. In the constraint-based fragment presented above, verbal subcat- 
egorization types only specify argument structure. The mapping between argument 
structure and valence is determined by a general map-args constraint. Comple- 
ment inheritance verbs are characterized by the fact that their COMPS-list may 
contain (inherited) complements which do not correspond to elements of ARG-ST. 
Consequently, complement inheritance verbs do not obey the map-args constraint 
as defined in the previous section. 

Complement inheritance can be accounted for if a rather different characteriza- 
tion of complement inheritance is introduced. Together with a modification of the 
map-args constraint this will make it possible to include complement inheritance 
verbs in the constraint based fragment developed so far. 

Whereas complements are normally saturated phrases (i.e. their COMPS- value is 
the empty list), the verbal complements of complement inheritance verbs need not 
be saturated. Thus, in terms of the verbal subcategorization relation introduced 
in the previous section, the distinction between a regular VP-complement taking 
verb, such as versuchen {try) and konnen is that the former requires a saturated VP 
whereas the latter selects for a (lexical) verbal complement, but does not impose 
any conditions on the value of COMPS of that complement. The relevant entries for 
verbal -subcat are given below. 





PHON 


versuchen 


(12) verbal-subcat ( 


ARG-ST 


{ NP i, Vj [COMPS ( )]) 




CONT 


versuchen! {i, j) 




PHON 


konnen 


verbal-subcat ( 


ARG-ST 


( NP i, Vj [comps CD ]) 




CONT 


konnen' {i, j) 



Note that CD in the second clause is provided only to make the contrast with the 
first clause explicit. As it is an anonymous variable, no constraint whatsoever is 
imposed on the value of COMPS. This suffices as a characterization of complement 
inheritance, if map-non-sub j-args is modified as follows: 
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(13) map-non-subj-args(( )) 

map-non-subj-args(( E [COMPS E ] 
map-non-subj -args([U, E) 



map-non-subj -args(< 

map-non-subj-args([2l, E) 



LOC E 
SLASH { E } 



E ), (E ® ( E ) ® S) ) 



E), E) 



The second clause, which maps non-subject arguments onto COMPS also prepends 
the complements of this element. This clause applies generally (i.e. to all comple- 
ments) and thus the possibility of complement inheritance is the rule, rather than 
the exception. Note, however, that for verbs selecting saturated complements, E 
in the definition above will be the empty list. In those cases, the definition of 
map-non-subj -args simply works as before. In cases where the value of COMPS 
of a complement is left unspecified (i.e. the verbal complement of an inheritance 
verb) the definition has the effect of prepending the complements of the verbal 
complement on COMPS, and thus a lexical entry will result which is identical to 
what is proposed in Hinrichs and Nakazawa (1994). 



4 Conclusions 

We have argued that recursive constraints can be used to eliminate a highly prob- 
lematic class of lexical rules, i.e. those affecting valence. Apart from avoiding 
a number of technical difficulties associated with the use of lexical rules, the 
constraint-based alternative has the advantage of providing a uniform account of 
complement and adjunct selection without spurious ambiguity. 
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Filtering Left Dislocation Chains 
in Parsing Categorial Grammar 

Crit Cremers* 

Maarten Hijzelendoorn* 



Abstract 

This paper reports on a way to reduce the complexity of the process of left 
dislocation (re)construction for categorial grammar in the case of lexically 
assigned gaps, as an additional restriction on the complexity arising from 
lexical polymorphism in general. Specifying extraction sites lexically has the 
advantage that the combinatory explosion can be contained in the preparsing 
track by a specialized constraint on the expansion of sequences of categories. 
This constraint is called the Left Dislocation Chain Filter and is implemented 
by a Finite State Transducer. It is shown that the Filter can reduce the 
number of full string assignments under consideration prior to parsing with 
an average of one half to one order of magnitude, depending on the nature of 
the sentence. 



1 Parsing Left Dislocation 

Left dislocation is a very common, almost universal phenomenon in natural lan- 
guages. It establishes the relation between an element at the left periphery of a 
clause and a particular, lexically open position in its right context. The leftward 
nature of dislocation is explained in Kayne (1994). The most prominent of these 
relations are invoked by so called wh-elements at the leftmost edge of questions and 
relative clauses. If a language has left dislocation, however, many other constituents 
can occur in a left dislocated position. Here are some examples from Dutch; the 
distinguished position in the right context is marked by t. 

(1) Mimi vroeg zich af [wie] Jan dacht dat t zou gaan winnen 
Mimi wondered (herself) who Jan thought that would go win 
‘Mimi wondered who Jan thought would win’ 

(2) [De film [die] ik heb t gezien] kan jij niet t gezien hebben 
The movie that I have seen can you not seen have 
‘You cannot have seen the movie I have seen* 



* Department of General Linguistics, Leiden University 
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In (2) we see two dislocations, one (a wh-induced type) within the boundaries of the 
other (a fronting of a ‘normal’, constituent). We will call the structure relating the 
left dislocated constituent and the empty position a left dislocation chain. We will 
refer to the two positions involved as the landing and the launch site, respectively, 
pursuing the dislocation metaphor. The lexical material at the landing site is also 
often referred to as the ‘filler’, and the other position as the ‘gap’. 

The properties and parameters of left dislocation chains are surely among the 
major topics of syntactic research in our days. It has become abundantly clear 
that there are major restrictions on these chains - represented by weak and strong 
islands for extraction - although there are many positions that may be part of one. 

Natural language grammar is supposed to establish left dislocation chains, as the 
interpretation of left dislocated constituents is determined by their chain. Parsing 
natural language grammars therefore involves the (re)construction of left disloca- 
tion. The nature of this construction will correlate to the grammar’s specification 
of dislocation, but the problem of parsing dislocation is quite general. At least the 
launch site of a chain is not explicitly marked - leaving aside prosodic information 
- and often, the lexical material in the landing site is not recognizable as being dis- 
located during lexical look-up. Consequently, a parser must actively compute the 
left dislocation chain in accordance with the grammatical nature of the relation; 
Van de Root (1990) e.g. has an insightful treatment of the computational problem 
of left dislocation for Marcus-parsing. The need for the computation is evident: if 
a parser deduces that a certain noun phrase may be left dislocated, it has to check 
the possible noun phrase positions in the right context for being the launch site. 
This kind of parsing problem does only occur if the left-peripheral constituent of 
a clause is selected by an entity to its right. Adverbial adjuncts, for example, do 
not necessarily belong to this class of chain inducing entities: they are modifying 
other constituents, rather than being selected by them. One might, however, find 
good reasons, along with Bouma and Van Noord (1994), to consider adjuncts as 
arguments after all. In that case, their occurrence at the left periphery of a clause 
must be seen as dislocation and gives rise to chains that have to be computed as 
well. The present study is not biased with respect to this alternative. 

In categorial grammar, one can think of at least two ways of establishing left 
dislocation chains. For categorial grammars exhibiting a full hypothetical logic, the 
‘gap’ is constructed by withdrawing a hypothetical occurrence of a category and 
thus, by introducing a complex category; this approach is pursued e.g. in Hepple 
(1990) and Morrill (1994; ch.8). 

Alternatively, gaps can be introduced as elements of lexical categories which 
are related to the filler by some form of ‘gap threading’ (cf. Pereira and Shieber 
1987). This is the approach taken in Delilah, a grammar/ parser system for Dutch 
developed by the authors. In the latter system, gaps are introduced as such in lexical 
assignments of categories to words, alternating with assignments providing lexical 
arguments (see below). In any case, the parser of a categorial grammar has to check 
several filler-gap combinations in a trial-and-error mode in order to determine a 
left dislocation chain. This paper reports on a way to reduce the complexity of the 
process of left dislocation (re)construction for categorial grammar in the case of 
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lexically assigned gaps, as an additional restriction on the complexity arising from 
lexical polymorphism in general. 



2 Lexical Ambiguity and Parsing Categorial Gram- 
mar 



Lexical ambiguity is known to be a major threat to efficient parsing of natural 
language. Barton, Berwick and Ristad (1987: ch.3) demonstrate that the combin- 
ation of simple agreement and lexical ambiguity makes natural language parsing 
NP-complete, i.e. a standard problem in the class of computationally intractable 
problems. Evidently, agreement can be taken to go proxy for all kinds of mutual 
dependencies between phrases in a sentence. Dependencies like agreement are at 
the heart of natural language, and adequate grammars must account for it. As a 
consequence, an adequate grammar of natural language can hardly be parsed effi- 
ciently if it has to allow for lexical ambiguity. In this vein, Johnson (1991) proves 
that even the Tomita algorithm for generalized LR parsing shows exponential com- 
plexity when applied to lexically ambiguous grammars. In this proof, the exponent 
is determined by the size of the grammar. These results confirm the observation 
in Gazdar and Mellish (1989: p.169) that “The cubic worst-case time efficiency 
problem for natural language parsers (...) is completely dwarfed in practice by a 
much more serious problem, that of pervasive natural language ambiguity”. 

Unfortunately, this statement is fully applicable to categorial grammar. Cat- 
egories can be seen as combinatorial agendas. Every category imposes a set of 
requirements on its context. A string of categories represents a well-formed sen- 
tence only if these requirements turn out to converge. A certain degree of lexical 
ambiguity - or rather: polymorphism per lexical atom - seems inevitable. In cat- 
egorial grammar, for example, differences in subcategorization (a verb may select 
an infinitival as well as a tensed complement), word order (a finite verb may have 
its complements to the left or to the right) or double functionality (a word might 
be a preposition or a particle) must lead, in some stage of the parsing process, to 
branching possibilities and an increase of search space. As an example of a lexical 
item which introduces many combinatorial agendas, consider the case of Dutch wil- 
len, ‘to want 1 . The lexicon of Dutch has to specify for willen at least the following 
different categories, which are casted here in a neutral format: 



( 3 ) 



(i) 


vp/vp 


(infinitival form with vp-complement) 


(ii) 


vp/s_sub 


(infinitival form with tensed sentential 
complement) 


(hi) 


s/vp/np 


(finite form with vp-complement for verb-second 
and verb-first main sentences) 


(iv) 


s/s_sub/np 


(as (iii), but with sentential complement) 


(v) 


s_vn\np/vp 


(as (iii), but for verb-final sentences) 


(vi) 


s_vn\np/s_sub 


(as (v), but with tensed complement) 



This list is not necessarily complete. For example, if one needs to distinguish de- 
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clarativity from other sentential modes like questioning, more sentential categories 
may be added. The variety that arises, cannot be handled by type changing rules. 
As a matter of fact, none of the types listed in (3) can be deduced from another 
type in the list by canonical type changing rules, though every finite type is a 
regular and predictable expansion of one of the basic infinitival types. 

In general, we can describe the problem of lexical ambiguity for categorial gram- 
mar as follows. Let Gjv/, be a categorial grammar for a language NL, and L a lexicon 
with initial assignment A(w\) of nonterminals to the words w\ of NL. For example, 
A (willen) contains at least the categories given in (3). Let S = w\ . . .w n be a sen- 
tence over L. Deciding whether S is in NL amounts to searching some sequence C 
= Ci . . .c n , with Ci € A(w\) such that C is derivable under G^l- The solution to 
the problem may require checking the derivability of many such Cs. Basically, for 
a certain S the number of sequences the derivability of which must be checked is 
11" |A(u>i)|, the Cartesian product over which is exponentially dependent on n. 
Iiy |A(u>j)| defines the search space for parsing S. The search space should not be 
defined by spurious ambiguity, however. It makes sense to require for each c in the 
lexical assignment of some word w that there is a sentence in NL containing 
which can only be derived under Gjv/, if c is in A(w). In this vein, we require every 
initial assignment of a category to a word to be necessary with respect to G^/,. 
Note that the search space is not influenced by the algorithm of the theorem prover 
itself. One can, however, represent lexical ambiguity as a complex category by 
means of additional type constructors, as is suggested e.g. by Morrill (1994: ch.6). 
To an ambiguous term a conjunction of categories and/or a disjunction of ar- 
guments is assigned. At each deduction step in which this complex category is 
involved, the theorem prover has to choose which of the coordinated types is ac- 
tivated. No hope is offered, though, as to the efficiency of this procedure. 

This approach to ambiguity shows, however, that there is, to a certain degree, 
a trade-off between lexical ambiguity and properties of the grammar. In particular, 
a grammar may assign nonterminals to certain lexically assigned nonterminals, 
by having monadic rules or theorems of the type c — ► c'. One can think of the 
famous lifting rule x — ► y\{y/x) and the Geach rule x/y — ► ( x/z)/(y/z ). Their 
presence, however, causes a serious problem for theorem proving itself, as they 
induce spurious ambiguity (see e.g. Wittenburg 1986, Konig 1990, and Hepple 
1990 for analyses and performance-oriented remedies of this phenomenon). In these 
cases, the search space for parsing S is partially constructed by the grammar itself. 
Again, the question whether G^l derives S, induces an explosion of queries as to 
whether derives a certain C. 

Because of the logic of categorial grammar, theorem proving is a genuine model 
for parsing these grammars (Hepple 1990). Propositions, which have to be checked 
for being a theorem, are sequences of categories. If a sentence gives rise to more 
than one sequence, all these sequences - i.e. all these combinations of combinatorial 
agendas - have to be checked. Of course, the theorem prover itself can be trimmed 
in several fashions, pertaining on the complexity of the proof-finding process. If, 
however, the categorial grammar involved is of a (mildly) context-sensitive nature 
(as are the Combinatory Categorial Grammar of Steedman (1990) - see Joshi et. al. 
(1991) - and the grammar in the present Delilah system - see below) the theorem 
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prover is confronted with the fact that context-sensitive recognition is PSPACE- 
complete (Hopcroft and Ullman 1979; ch.13). But the major burden on the parsing 
process is imposed by the multiplicity of potential theorems, independently of the 
properties of the deductive system. Below we will discuss how the combinatorial 
explosion of hypotheses can be controlled in DELILAH. 

Here we will concentrate on the processing of a particular source of lexical poly- 
morphism: the possibility that in a locally-defined argument structure one argument 
may be missing as a result of left dislocation, also known as movement to [Spec, 
CP]. The categorial lexicon of Dutch may specify as a category of the verb zien ‘to 
see’ not only vp\np to indicate that it is meant to head a configuration with an np 
to its left, but also vp\np A gap to indicate that that object may not be present. 

If we consider the category vp\np as introducing a local tree of type (4) (i) , to 
which a local tree with a root np can be adjoined at the node marked as such, 
the category vp\np A gap must be seen as the introduction of the local tree (4) (ii) 
in which this node is barred by emptiness. Structurally, the trees are the same, 
though they will be treated differently by the rules of grammar (see below). 

(4) 



(i) vp (ii) vp 




np v np v 



zien e zien 

Gapped categories are invariantly specified as left arguments, since dislocation is 
leftward. Therefore, the gapped counterparts to the categories pp/np and vp/vp 
are pp\np A gap and vp\vp A gap , respectively. 

The additional specification of zien as a verb that may lack an adjacent object 
is predictable - the objects of transitive verbs are generally available for extrac- 
tion - but not trivial. Not all np arguments occurring in some lexically assigned 
category are candidates for extraction; the np in a possessive determiner (np } s n), 
for example, is not. Moreover, it is useful to store the information that, within a 
certain complex category, at most one argument is available for extraction. The 
verb geven ‘to give’ has one category specifying a bitransitive infinitive, but also 
two additional categories specifying the separate extractability of each argument 
of that infinitive - only one argument can be dislocated, of course. Consequently, 
in infinitival position geven has three lexical options, instead of one; the number 
of finite lexical categories associated with geven multiplies accordingly. 

The specification of extractability, then, may increase the number of categories 
assigned to a particular lexical item and contribute to a search space explosion for 
parsing. As noted above, one could choose not to specify launching sites lexically, 
but to compute the possibilities while parsing. For Lambek’s categorial grammar, 
there is the option of hypothetical reasoning, as in Morrill (1994), and in other 
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frameworks one can implement other forms of gap threading, as in Stabler (1992). 
But deriving all possible extraction sites is not necessarily more efficient than spe- 
cifying them. In fact, we will show that specifying extraction sites lexically has the 
advantage that the combinatory explosion can be contained in the pre-parsing track 
by a specialized constraint on the expansion of sequences of categories. Moreover, 
the lexical approach complies with the argument by Johnson and Kay (1994) that 
gap hypotheses in a derivation must be licensed by lexical items in order to assure 
termination of the parsing process. 

To a large extent, the art of parsing is finding secure means to restrict the num- 
ber of possible assignments. Given the exponential function nj |A(u;i)|, efficiency 
requires serious pruning of the search space. Optimally, the means to achieve this 
are anchored in the grammar that is to be applied. Resource-sensitive categorial 
grammar offers some options for pre-checking assignments. The parsing system DE- 
LILAH incorporates an instance of a bracket-free, mildly context-sensitive categorial 
grammar. It deals with various forms of discontinuity, like free coordination, verb 
raising and long distance dependencies. 

The grammar basically consists of one cancelling operation, generalized com- 
position (cf. Steedman 1990, Joshi. et al 1991), operating on two triples of the 
form Head \ LeftArgumentList / RightArgumentList. 

Heads are basic types; they may be cancelled while the argument lists of their 
categories merge with the argument lists of the category that provokes the can- 
celling. The formalism and its applications are discussed in Cremers (1993; ch.2). 
A related, but slightly less expressive formalism is defined in Milward (1995) as 
AB Categorial Grammar with Associativity (AACG). delilah’s grammar may be 
taken to have at least mildly context-sensitive power, as it extends the concept of 
generalized composition, which Joshi et al. (1991) prove to be in that class. 

In (5) we present the general scheme for a left-to-right cancelling. (Of course, 
right- to-left cancelling also exists.) 

(5) DELILAH’s Grammar Format 
If a string with category 

PrimaryHead \ LeftList / [Secondary Head A Operator | RestRightList] 
occurs to the left of a string with category 

SecondaryHead \ OtherLeftList / OtherRightList, 

combine these strings - under some restrictions triggered by Operator with 
respect to the content of the argument lists - to a string of category 

PrimaryHead \ NewLeftList / NewRightList, 

where NewLeftList and NewRightList stem from appending the two left lists 
and the two right lists, respectively, in either one of two possible orders, which 
encode either continuity or discontinuity. 
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In a rule notation we get (6), where append 0 ( L1,L2) is some append-operation 
triggered by some operator A o, yielding a list whenever it is defined for LI and L2, 
and non-executable otherwise. In the latter case, the two categories to the left of 
the arrow cannot reduce to one by cancellation of Sec. 

(6) Prim\LListl/[Sec A o|RListl] Sec\LList2/RList2 — ► 
Prim\append 0 (LListl,LList2)/append 0 (RListl,RList2) 

A string is considered to be a well-formed sentence, iff the lexical hypotheses (cat- 
egorial agenda’s) can be reduced by recursive applications of (6) to one category 
s\ /////• This grammar strictly preserves directionality (cf. Steedman 1990), only 
cancels elementary types, and does not use hypothetical reasoning. Instead, be- 
cause the composition rule takes into account the full internal structure of both 
the primary and the secondary category, non-peripheral extraction can be treated 
without specialized operators like the up and down arrows of Moortgat (1988). Dis- 
location and other forms of word order variation can be handled by composition 
alone. (7) shows two options of adjunction to a verb; both options are available 
if the operator A o is defined for the relevant internal structure of the secondary 
category at the stage of cancelling vp . 



(7) (i) vp\0/[vp A o] np\0/D 


vp\[np A 


-1/0 =* 


v p\D/( v p a °] v P\0/0 


=> 




v p\D/D 








(ii) np\0/0 


v p\D/( v p a °] 


vp\[np A 


-1/0 


n p\D/D 


vp\[np A -]/0 


=> 




vp\0/0 









The combination of directionality and the fact that the grammar only cancels basic 
types - as does Milward’s (1995) AACG - has interesting consequences for parsing. 
For a given prefix P(i) of assignments to the first i words of a sentence, we can 
decide whether it makes sense to add to it a certain lexical category of the i + 1th 
word, that is, we can decide whether P(i) plus that certain category can be the 
prefix P(i + 1) of a sequence of categories which may be parsed succesfully. If not, 
that particular extension of P(i) is rejected, and with it all the (virtual) sequences 
in the set with cardinality nj|A(u;i)| that have it as a prefix. Technically, these 
prefixes are best considered to be paths of a tree which is built tier-by-tier dur- 
ing lexical look-up: a directed acyclic graph with categories as vertices and edges 
between neighbours in a sequence. At the extreme vertex of every path, information 
is accumulated on the type pattern of the path. A new category is added as a vertex 
connected to a path and extending that path only if its agenda is not incompatible 
with the information at the ’preceding’ vertex. When the category is added as a new 
vertex, it stores the updated information on the extended path. If no category of a 
certain word is compatible with an existing path, the path is pruned; all remaining 
(active) paths are of equal length. The main instrument here is an operationaliz- 
ation of Count Invariance (Van Benthem 1986, Moortgat 1988, Konig 1990): the 
property that sequences of complex types can be derived only if they exhibit a 
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certain balance of primitive types. This is the central strategy used in DELILAH 
to restrict the search space, even in the context of coordination (see Cremers and 
Hijzelendoorn 1997 and Cremers 1989). Thus, the search space is considerably lim- 
ited on-line. As a matter of fact, while building the tree, the pruning rate exceeds 
the growth factor of the search space (cf. Cremers and Hijzelendoorn 1997). As 
nj |A(wi)| explodes with n, the proportion #active-paths-of-length-n /IIJ | A(wi)\ 
decreases exponentially. The decrease of this proportion justifies the construction 
of the pruned tree of viable prefixes-up-to-n. For a numerical illustration of this 
effect, consider the parsing of the sentence 

(8) Wie zegt de man, die ik wilde laten werken, dat hem gedwongen heeft mij 
met de poppen te laten spelen? 

Who says the man that I wanted let work that him forced has me with the 
dolls to let play? 

‘Who does the man, I wanted to work, say that forced him to let me play 
with the dolls? 5 

Under Delilah’s present lexicon, the search space Iiy \A(wi)\ for this sentence con- 
sisting of 20 words contains 822,528,000 possible assignments (paths). Building 
and pruning the tree with a remainder of 109 paths takes 6,120 ms cpu time (ex- 
cluding time for garbage collecting, stack shifting, or in system calls) on a Silicon 
Graphics Indigo R4000 workstation; this includes the time taken by the special 
pruning algorithm for gapped categories to be discussed below. Parsing these 109 
non-rejected sequences takes 360 ms, say 3 ms for each. Since there is no principal 
difference between paths which were pruned and paths which survived pruning, we 
can deduce that parsing the whole tree of sequences would have taken 822,528,000 
times 3 ms is 2,467,584,000 ms. This is about 411,000 times as much as DELILAH 
needed to construct-and-prune the search space. 

It is worth noting that the pruning does not put any claim on the parsing 
strategy that is applied. The dynamic application of Count Invariance only selects 
what has to be parsed, not how this task is performed. In particular, since not 
all the remaining paths will differ from each other at any node, one can think of 
a form of chart parsing to exploit the remaining hypotheses space. On the other 
hand, little is known about efficient parsing of context-sensitive grammars. 

Under the grammar sketched above, left dislocation is solved syntactically by bring- 
ing together the left dislocated constituent and a unique gap. The gap is transported 
leftwards by generalized composition: in fact, a gap is ‘inherited 5 by every category 
of every string that contains the word introducing the gap. Finally, the gap is the 
only left argument of the constituent to the right of the dislocated phrase. In the 
example derivation (9) with a left dislocated noun phrase, X and Z are sequences of 
categories. Only a few stages of the derivation are made explicit; they are connected 
by subsequent application of generalized composition (5-6). 
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(9) n P\ 0/ 0 s\0/[-l X y\[...]/[vp A o] vp\[. . . ,np A gap]/[. . . ] Z =>■ 
n P\0/0 s\D/[...) x y\[...,np A gap]/0 => 
n P\0/0 s\0/[x A o] x\[np A gap]/0 => 
n P\0/0 s\[np A gap]/0 => 
s\D/D 



Except for special circumstances, which we will not consider here (e.g. parasitic 
gaps; for a treatment see Morrill 1994), gaps and dislocated constituents are related 
one-to-one. 



3 Filtering Left Dislocation Chains 

Although Count Invariance (exploiting the resource sensitivity of certain categorial 
systems) is operationalized in DELILAH for on-line reduction of search space prior to 
proper parsing, it cannot discriminate between configurations (possible prefixes of 
sequences of lexical assignments) other than by the mere occurrence of basic types. 
Gaps have to be marked in the lexicon for the category they represent. If a gap is to 
be bound to a preposed np, it has to be marked for this binding as a gap or variable 
of that particular type. Consequently, the introduction of gaps may increase the 
number of categories of a certain lexical item that looks for a particular type to 
the left; gaps are always bound by left dislocated constituents (cf. Kayne 1994). As 
for the application of Count Invariance, there is no difference between a category 
vp\[np]/[] and its gapped relative vp\[np A gap]/[]: they expand the same tree in 
terms of number, labels, and structure of nodes (cf. (4)). If Count Invariance allows 
for the attachment of one of them to a prefix P(i ), it also allows for the attachment 
of the other. In this case, the number of sequences to be checked for grammaticality 
is doubled by the mere presence of a gapped category in the lexicon for the i + 1th 
word in the sentence. In general, Count Invariance is underspecified with respect 
to the grammatical extension of prefixes, as it is applied prior to parsing. 

The main contribution of gapped categories to the extension of the search space 
is caused by the gap’s location being undetermined. In a given string of words, 
there are many possible candidates that introduce a gapped category, though in 
the final parse only one of the candidates will turn out to be the real gapped 
category. Sequences like the following are hardly candidates for succesful parsing if 
only one of the categories introduces a finite domain. 

(10) np ... s\[np A gap]/_ ... vp\[np A gap]/_ ... 

It makes sense, then, to look for additional means to prevent spurious accumulation 
of gapped categories in a sequence, just to accommodate the gap’s indeterminacy 
a priori. 

For every prefix of a sequence of lexical categories, we use a Finite State Transducer 
(FST) in DELILAH to keep track of the maximal number of gaps that must be made 
available in the suffix of that sequence. This FST performs its tasks in the very 
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same pre-parsing track in which Count Invariance is used dynamically to prune 
the search space. The timing for the pre-parsing procedure for (8) given above, 
included the operations of the FST. 

The general idea is as follows. Every prefix P(i) of a sequence of assignments 
S(n) is associated deterministically with a state of an FST and a number generated 
by that state. Recall that these prefixes can be seen as paths of a tree under 
construction. The number that the FST provides, is associated with the extreme 
vertex of that path. This number indicates how many gap ‘slots’ are available for 
a suffix to that prefix. The states of the FST represent the relevant features of a 
particular P(i). The i + 1th word may introduce a category for which the FST 
is defined in that state, or not. If the category is in the domain of the state of 
the FST with which P(i) is associated, it will force the FST to move. This move 
may involve adding 1 or subtracting 1 from the counter, or even a reset of the 
counter. Addition moves are forced by types which introduce a syntactic domain 
from which extraction is permitted. Finite verbs, for example, force addition moves. 
Subtraction moves are forced by categories containing gaps, or - in certain states 
- by categories that indicate closure of extraction domains. No subtraction move 
can be made if the counter is zero. In that case, the FST fails, and the category 
will not be added to P(i) to form P(i + 1), since P(i + 1) cannot be associated 
with a state of the FST. The category may be added to some other prefix P(i), 
though. Also, another category from the i + 1th word’s lexical assignment may be 
added to P(i) to form a P(i H- 1). 



Thus, the moves of the FST are triggered by types occurring in a category of 
the i + 1th input word for a given prefix of assignments P(i). In its actual form, 
DELILAH activates the FST only if the category that is a hypothetical extension 
of a prefix of a sequence of assignments is either 1. headed by a main sentential 
type, 2. has a finite sentential type as an argument, 3. contains a gapped type, or 
4. is headed by prepositions. The first two triggers will be evident: they are the 
ones that indicate finite domains, the main extraction fields. They force the FST 
to make an addition move. The third one is also evident: it is the one that forces 
a subtraction move. PPs need a special treatment in order to account for certain 
consequences of pied-piping. They have to make available an additional gap option, 
apart from the option that allows for their own dislocation which is introduced in 
a standard way. 

Here is a description of the FST and some comments. The triggering types are: 



head_main_s (hms) 
head_embedded_s (hes) 



head.pp (pp) 



the type that is the head of a finite verb in 
main sentences; e.g. s in s\[np A gap]/[vp ] 
the type that is the head of a finite verb in 
embedded sentences; e.g. s.vn in 
s.vn\[np np]/[ ] 

the head of a prepositional category; e.g. 
PP in pp\[ ]/[np] 



o 

ERIC 

hfliflaffHHaaaa 
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right.argument.embedded_s (raes) 



gap (gap) 



the type that is a right hand side argument 
of a complementizer, announcing the start 
of an embedded sentence; e.g. sjvn in 
np\[np]/[sjvn] 

the left hand argument of any category that 
is marked for gap; e.g. pp A gap in 
vp\[np pp A gap}/[ ] 



Of each type, the head is processed first, then its left searching arguments, and 
finally its right searching arguments. The transition scheme is given in (11) as a 
relation between a triple <state, type, number> and a pair <state, number>. Some 
very idiosyncratical transitions are left out for transparency reasons. 

(11) Left Dislocation Chain Filter 



initialization: <a, 0> 
<a, hms, X> =$► <e, 1> 
<a, hes, X> => <a, X> 



<a, gap, X> = 
<a, raes, X> 
<b, raes, X> 
<b, hes, X> = 
<b, gap, X> = 
<c, hms, X> : 

<c, hes, X> = 
<c, raes, X> = 
<c, gap, X> = 
<d, hes, 0> =i 



<a, X— 1 > if X > 0 

- <b, X+l > 

> <Cb, X+l > 

<b, X> 

• <b, X-l > if X > 0 

- <e, 1> 



<d, X> 
r> <b, X+l > 

► <c, X— 1 > if X > 
<a, 0> 



0 



finite main verb resets gap options 
finite embedded verb does not change 
options; necessary for coordinated structures 
consumption of gap option 
introducing a new domain for dislocation 
idem 
as before 
as before 

all options not yet consumed are lost; number 
of options reset to one 
just a state transition 
as before 

consumption of gap option 
re-initialization; the end of a domain is 
introduced; all options used 
end of domain; gap option for that domain 
not used; number of options decreases 
idem, but also start of new domain: options 
not changed 
as before 
consumption 

introduces new domain with additional 
option 

consumption 

for every state S a special state [S] is needed 
for pied-piping phenomena, introducing an 
additional gap option 
consumption in ‘pied-piping’ state 
empty move otherwise; the extra option is 
cancelled 

A category C will be added to a prefix of assignments P(i) in state S with number 
N if the FST is defined in S for the types in C. A simple example is given in (12). 



<d, hes, X> => <c, X— 1 > 



<d, raes, X> =$► <b, X> 



<d, hms, X> 
<d, gap, X> = 
<e, raes, X> = 

<e, gap, 1> =1 

<S, pp, X> 



=> <e, 1> 

> <c, X— 1 > if X > 0 
P- <Cb, X+l > 

<a, 0> 

<[S], X+l > 



<[S], gap, X> =f- <S, 
<[S], €, X> =► <S, X- 



X-l > 
-1 > 



o 

ERIC 
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(12) Ik wil elke man een boek geven 

I want every man a book give 

‘I want to give every man a book’ 

Let the string of categories in (13) be a prefix of assignments (one of possibly many 
more) to the first six words, up to boek , i.e. P( 6). 

(13) np s\[np A gap]/[vp] np\[]/[n] n np\[]/[n] n 

That prefix, just one among others that may have survived thus far, will be asso- 
ciated with state <a,0>. This can be seen as follows. The FST is entered in state 
<a,0>. The first type, np, is not a triggering type; the FST does not move. The 
head of the second type, s is an hms; the FST moves to state <e,l>, creating an 
option for a gap to be consumed later. The left searching category np A gap is a gap, 
which brings the FST back to state <a,0>. The right searcher vp has no effect, 
because it is not a triggering type. The same is true for the heads and searchers of 
the third, fourth, fifth and sixth type in (13). Now geven will be considered. Among 
its lexical categories we find some that contain gaps, like s\[np A gap np np\j[ ] and 
v p\[np np A gap }/\\ . Most of them will be rejected for concatenation to that prefix, 
since a category containing a gap cannot be processed from <a,0>. The only op- 
tion for a gapped category of geven would be one that is headed by a finite main 
sentence type (hms), for this category would bring the FST in state <e,l>, in- 
troducing a gap option. This option is rejected, however, because of other filtering 
mechanisms apart from the left dislocation FST. Only categories headed by vp with 
no gapped argument remain as possible continuations of the prefix. The number of 
assignments that has to be parsed (checked for derivability) is seriously cut down. 



4 The Effect of Chain Filtering 

We have tested the effect of chain filtering by computing three values for a certain 
set of sentences: 

(a) the number of full string assignments left under chain filtering 

(b) the number of full string assignments left without chain filtering 

(c) the number of full string assignments in case the lexicon would have no gaps. 
In the latter case chains must be identified by hypothetical reasoning. In our ap- 
proach gaps are fixed per full string assignment: no additional hypotheses as to the 
possible occurrences of gaps are necessary while parsing. 

In (figure 1) at the end of the paper the table of results is given; the results 
are presented as natural logarithms to express their order of magnitude. They are 
ordered with respect to the length of the sentences in the test set (column a). All 
counts are submitted to a cluster of other filtering devices which are not related 
to left dislocation, but which do a major job at distinguishing viable from inviable 
prefixes, as was discussed for example (8). The numbers of full string assignments, 
which survived the general filtering devices including and excluding the Chain Fil- 
ter, mark a rather undetermined stage in the processing of the sentences; the strings 
counted below are not necessarily all parsed. Most of the sentences are coordinated 
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sentences, which complicates any filtering of prefixes: the selection is pre-parsing, 
the coordinates are not yet determined at that stage, and this indeterminacy must 
be reflected in weakened application of the FST (which is not spelled out above). 

The exponents given in column (f) of the table hold the main result of chain 
filtering. They indicate the difference between the number of sequences of cat- 
egories that must be processed if chain filtering is applied (column d) and the 
number of sequences of categories that must be processed if chain filtering is not 
applied (column e). Both numbers can be compared to the number of sequences 
that would survive Count Invariance if the lexicon would not contain gapped cat- 
egories (column c). In that case, however, DELILAH will not be able to parse left 
dislocation any longer. Column (b) lists the Cartesian product over u>j, i.e. without 
any filtering, but including gapped categories. It defines the upper bound for the 
exponents in the columns (c), (d) and (e). 

From these figures one can see that the Left Dislocation Chain Filter has a 
measurable effect on the number of sequences that must be parsed. It can reduce 
the number of full string assignments under consideration prior to parsing with an 
average of one half to one order of magnitude (column f), depending on the nature 
of the assignments, i.e. the nature of the sentence. In the final case, for example, the 
application of the Chain Filter reduces the number of full string assignments at this 
particular stage of processing by a factor of e 1,39 = 4. All possible analyses are kept 
in store, however; in that respect, chain filtering is as conservative as necessary. 

The number of sequences left after chain filtering is still considerably larger 
than the number a parser checking dislocation by hypothesising gaps would have 
to consider. This is not surprising. Specifying gaps lexically introduces at least n 
additional sequences for every gap-less sequence of assignments with n extractable 
arguments. It depends on the parsing procedure to what extent this complicates the 
parsing of the sentence. DELILAH can parse these additional assignments marked 
for gaps deterministically: the number of assignments is the only factor affecting 
the complexity of the solution to the problem of left dislocation. 

It is by no means clear that the Left Dislocation Chain Filter is stated in the 
best possible way. It must be stressed, however, that in the presence of coordination 
- all but three of the sentences measured above are coordinated ones - filtering left- 
dislocated chains is weakened by necessary precautions with respect to across-the- 
board phenomena. As long as one does not know what is coordinated exactly, the 
substring to the right of a coordinating element may have to accommodate all the 
chains that were possibly established at the left of the coordinator. In this respect, 
the results show that chain filtering keeps performing under difficult conditions. 
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column a: sentence length in words 

column b: natural logarithm (In) of the number of full string assignments 
(unfiltered; n" |^4(wi)|) for a lexicon with gapped categories 
column c: (In of the) number of full string assignments (filtered by independent 
checks) for a lexicon without gapped categories 

column d: (In of the) number of full string assignments (filtered) for a lexicon 
with gapped categories, and with application of chain filtering 
column e: (In of the) number of full string assignments (filtered) for a lexicon 
with gapped categories, and without application of chain filtering 



column f: difference d — 


e; effect of chain filtering, in orders of magnitude; 


a negative 


effect means reduction of the search space 






a 


b 


C 


d 


e 


f 


#words 


Cart, product 


^assignments 


# assignments 


^assignments 


Chain 




over Wi 


-gapped cats. 


-fgapped cats. 


-fgapped cats. 


Filter 




-fgapped cats. 


-Chain Filter 


-fChain Filter 


-Chain Filter 


effect 


7 


1.60 


0.69 


1.09 


1.09 


0.00 


8 


4.02 


2.07 


3.17 


3.17 


0.00 


9 


9.91 


4.14 


6.76 


6.83 


-0.07 


10 


4.02 


2.07 


3.17 


3.17 


0.00 


11 


11.38 


0.0 


2.07 


2.77 


-0.70 


13 


10.13 


3.17 


5.41 


5.77 


-0.25 


14 


14.02 


3.17 


7.32 


7.57 


-0.25 


15 


6.10 


3.46 


4.85 


4.85 


0.00 


16 


15.02 


4.27 


8.58 


8.95 


-0.37 


17 


14.45 


3.87 


7.76 


8.41 


-0.65 


18 


18.65 


1.79 


5.54 


7.56 


-2.02 


19 


17.44 


5.25 


9.77 


10.22 


-0.45 


20 


16.60 


5.06 


10.06 


10.63 


-0.57 


21 


13.68 


4.85 


9.63 


10.02 


-0.39 


22 


20.06 


3.46 


9.10 


10.58 


-1.48 


23 


15.83 


4.56 


9.36 


9.80 


-0.44 


24 


8.05 


3.46 


6.64 


6.64 


0.00 


25 


18.31 


7.09 


13.66 


13.99 


-0.33 


26 


9.57 


4.85 


8.72 


8.72 


0.00 


28 


21.03 


5.95 


12.07 


12.58 


-0.51 


29 


26.36 


5.50 


13.10 


14.08 


-0.98 


32 


29.43 


1.09 


7.24 


8.65 


-1.41 


34 


23.12 


5.25 


12.79 


13.64 


-0.85 


37 


28.18 


4.85 


13.23 


14.62 


-1.39 



Figure 1: Table of Results 
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Speech Output Generation in GoalGetter 

Esther Klabbers* t 



Abstract 

In this paper a method for speech output generation in data-to-speech sys- 
tems is proposed, called phrase concatenation, which tries to find a balance 
between naturalness and flexibility of the speech output. The GoalGetter sys- 
tem, which generates spoken monologues on football matches, serves as an ex- 
ample. The phrase concatenation technique involves concatenating prerecor- 
ded words and phrases, which is new in that different prosodic versions of 
otherwise identical phrases are recorded. 



Introduction 

The main issue addressed in this paper is the problem of generating high qual- 
ity speech in data-to-speech systems, i.e., systems which present data in the form 
of spoken monologues, sometimes also called concept-to-speech systems. Data- 
to-speech generation is a relatively new area of research. Traditionally, research 
on spoken-language generation was mainly undertaken within the separate fields 
of natural-language generation and text-to-speech synthesis. State-of-the-art lan- 
guage generation is capable of generating flexible utterances and texts, but often 
the intonational properties are not taken into account. Text-to-speech synthesis 
often fails to generate adequate prosody due to the lack of information available 
in texts. In contrast to text-to-speech systems, explicit discourse models can be 
reliably constructed in data-to-speech systems, so that a more natural prosody can 
be achieved. 

The method of speech output generation is explained in the context of a simple 
data-to-speech system called GoalGetter, which generates spoken monologues on 
football matches. GoalGetter generally works as follows: it takes as input a Teletext 
page that contains summary information on a particular football match. The 
Teletext page lists the two teams that played against each other, the score, which 
players scored when, etc. From this concise information, the language generation 
module (LGM) generates a coherent text using syntactic templates. The output 
text, enriched with prosodic markers, is passed on to the speech generation module 
(SGM), which makes it audible through one of two output modes, i.e., diphone 
synthesis or phrase concatenation. 

TPO, Center for Research on User-System Interaction 

^This research is carried out within the framework of the Priority Programme Language and 
Speech Technology (TST). The TST-Programme is sponsored by NWO (Netherlands Organiza- 
tion for Scientific Research). 
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Before explaining the phrase concatenation technique, it is necessary to get a 
general idea of the working of the LGM. It is responsible for the content and form 
of the utterances and the prosodic properties, and as such sets the pre-conditions 
the SGM has to satisfy. 



1 Language Generation in GoalGetter 

The technique used for natural language generation in GoalGetter was originally 
developed at IPO for an English-spoken database query system called Dial-Your- 
Disc (DYD). This system generates spoken monologues about compact discs with 
musical compositions written by Mozart (van Deemter, Landsbergen, Leermakers, 
and Odijk 1994). The architecture of the LGM is depicted in Figure 1. 



Data 




Enriched 

Text 



Figure 1: The architecture of the Language Generation Module (LGM) 







”De ”wedstrijd tussen ”PSV en ”Ajax / eindigde 
in ”@een // - ”@drie /// ” Vijfentwintig duizend 


team 1: 


PSV 


’’toeschouwers / bezochten het ’’Philipsstadion /// 


goals 1: 


1 




team 2: 


Ajax 


”Ajax nam na ”vijf ’’minuten de ’’leiding / door een 


goals 2: 


3 


’’treffer van ’’Kluivert /// ’’Dertien minuten ’’later / 


goal 2: 


Kluivert (5) 


liet de aanvaller zijn ’’tweede doelpunt aantekenen 


goal 2: 


Kluivert (18) 


/// De % ’’verdediger ’’Blind / verzilverde in de 


goal 2: 


Blind (83/pen) 


’’drieentachtigste minuut een ’’strafschop voor Ajax 


goal 1: 


Nilis (90) 


III Vlak voor het ’’eindsignaal / bepaalde ’’Nilis van 


referee: 


Van Dijk 


”PSV de ’’.eindstand / op ”@een // - ”@drie /// 


spectators: 


25.000 




yellow 1: 


Valckx 


% ’’Scheidsrechter van ”Dijk / ’’ieidde het duel III 
’’Valckx van ”PSV kreeg een ”gele ’’kaart /// 



Figure 2: Example input and output of the LGM 

The input for the Generation module in the LGM is formed by a textual rep- 
resentation of a teletext page on a particular football match (see Figure 2). It also 
uses a database that contains fixed background data about e.g., the names of the 
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stadiums and the field positions for each player (defender, goalkeeper). To gener- 
ate sentences, the Generation module uses a set of so-called syntactic templates. 
These are basically syntactic parse trees with fixed parts, carriers , and variable 
parts, slots , in which other syntactic templates can be inserted. An example tem- 
plate is depicted in Figure 3. The templates have conditions attached to them 
about when they can be used. For instance, a template expressing the number of 
spectators of a match can only be used after the match was introduced, e.g. by nam- 
ing both teams. In order to be able to check which information is already known, 
a Knowledge State is maintained. Furthermore, to ensure the well-formedness of 
referring expressions used to fill the template slots, we need information about 
which discourse objects have been mentioned, and how and when they have been 
referred to. This is recorded in the Context State . Each piece of information in 
the data structure can be expressed by at least one template. To allow for more 
variation in the output text, more templates can be implemented to express the 
same information in different ways, which are selected randomly. 



cp 



<time> 



cb 



CONDITIONS: 




cO ip 




<playergen> nb 




<ordinal> nO 

doelpunt 



TOPIC goalscoring 

time <— express:[currentevent.time, currentmatch, c] 

player <— express:[currentevent. player, currentmatch, c] / nom 

playergen <— express: [currentevent. player, currentmatch, c] / gen 



Figure 3: Syntactic template for the sentence Dertien minuten later liet Kluivert 
zijn tweede doelpunt aantekenen 
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In the last stage of text generation, the Prosody module computes the accents 
and prosodic boundaries taking the properties of the Context State into account. 
The accentuation algorithm is based on a version of Focus-Accent Theory (van 
Deemter (1994); Dirksen (1992)), where binary branching metrical trees are used 
to represent the relative prominence of nodes with respect to pitch accent. After 
accentuation, phrase boundaries are assigned. The output of the LGM is an en- 
riched text i.e., a coherent text with prosodic markers (see Figure 2), which is 
passed on to the SGM. The prosodic markers will be discussed in Section 3.1. For 
a more extensive explanation of the LGM see (Klabbers, Odijk, de Pijper, and 
Theune 1996). 

2 Speech output generation methods 

In commercial data-to-speech systems, it is important that the voice output in- 
terface be of high quality. There are several methods to provide a system with 
speech output, each with their advantages and disadvantages. Three methods are 
distinguished here, viz. the use of prerecorded speech, speech synthesis and speech 
concatenation. 

2.1 The use of prerecorded speech 

A maximum degree of naturalness can be achieved by playing back digitally stored 
natural speech. In the past, several information announcement systems have been 
created to provide such services as weather, motoring and tourist information, re- 
cipes, and bed-time stories. The speech output was created by simply making 
recordings of the whole information base and playing a loop or disc continuously 
throughout the day (Waterworth 1984). This approach has two main disadvant- 
ages. Firstly, memory and storage limitations will become a problem once the 
vocabulary of the system becomes too large. Secondly, the approach is highly 
inflexible in that entire messages have to be re-recorded to update the vocabulary. 

For GoalGetter, the vocabulary consists of a limited set of carrier sentences 
and a more extensive set of variable words that can be inserted in the slots ( slot 
fillers). Even though the vocabulary is within limits (approx. 2000 words), the 
total number of combinations is almost innumerable. Adding a new football player 
to the vocabulary would necessitate the recording of a large set of new sentences in 
which this player can occur. Therefore, for GoalGetter, using prerecorded speech 
is not a feasible method. 

2.2 Speech synthesis 

An alternative that yields a maximum degree of flexibility is the use of synthetic 
speech. This method requires much less memory than stored-waveform techniques. 
One way of producing synthetic speech is by allophone or formant synthesis which 
attempts to approximate the acoustic output of a speaker. In the DYD system 
the DECTALK formant synthesizer was used (Allen, Hunnicutt, and Klatt (1987) 
discusses its predecessor MITalk). It models the vocal tract transfer function by 
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simulating formant frequencies, bandwidths and amplitudes. The process is con- 
trolled by 20 - 40 parameters which are updated every 5-10 ms. For this ap- 
proach, extensive knowledge is needed on how the acoustic properties of the speech 
signal evolve over time. The parameters are highly correlated with production and 
propagation of sound in the oral tract. Various sorts of voices can be generated, 
as well as different speaking styles, speaking rates, etc. One of the drawbacks 
of this approach is that the automatic technique of specifying parameters is still 
unsatisfactory. The majority of parameters has to be optimized manually. 

Current speech synthesizers usually produce speech by means of diphone syn- 
thesis. A diphone database consists of small segments excised from human speech, 
that cover the transitions between any two sounds of a given language. The manual 
preparation of the appropriate speech segments can be time-consuming, but once 
the inventory is constructed, there is only moderate computational power needed. 
Diphone concatenation is less flexible than formant synthesis, since only one voice 
can be synthesized. When a different voice is needed, a new diphone database has 
to be constructed. 

Intelligibility of synthetic speech can be quite high. Diphone synthesis usually 
has a higher intelligibility rate than formant synthesis. However, recent evaluations 
show that when both types of synthetic speech are sent through a telephone chan- 
nel, intelligibility decreases significantly. In GSM conditions, intelligibility drops 
even further (Rietveld, Kerkhoff, Emons, Meijer, Sanderman, and Sluijter 1997). 
Furthermore, naturalness still leaves a great deal to be desired. This leads to the 
conclusion that speech synthesis is not yet suitable for use in commercial applica- 
tions. 

Diphone synthesis has been implemented as one of the output modes in Goal- 
Getter in order to test the prosody rules in the LGM. Because the LGM generates 
an orthographic representation with a unique phonetic representation 1 , it is pos- 
sible to do errorless grapheme-to-phoneme conversion by lexical lookup instead 
of rules. The phonetics-to-speech system SPENGI (SPeech synthesis ENGIne), 
developed at IPO, provides GoalGetter with PSOLA-based diphones (Pitch Syn- 
chronous Overlap and Add, Charpentier and Moulines (1989)). However, the pros- 
odic and durational realization rules in SPENGI have not been optimized for the 
GoalGetter domain. In the rest of this paper, we focus on another output mode, 
namely that of speech concatenation. 

2.3 Speech concatenation 

The key to generating high quality speech output is to find a balance in the trade- 
off between naturalness and flexibility. In that respect, concatenating prerecorded 
units like words and phrases appears to be a good alternative. With this approach, 
a large number of utterances can be pronounced on the basis of a limited number 
of prerecorded words and phrases, saving memory space and increasing flexibility. 
This technique is practical only if the application domain is limited and remains 
rather stable. Speech concatenation is used in most voice response services, but 
often the method is so straightforward, that it is not even mentioned in publica- 

x It could also generate a phonetic representation directly. 
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tions. The necessary words and phrases are simply recorded and the concatenated 
sentences are played back when required. This approach has two major problems: 

1. Very careful control of the recordings is needed. Usually, this is not accounted 
for, so that differences in loudness, rhythm and pitch patterns occur, leading 
to disfluencies in the speech. Phrases seem to overlap in time, creating the 
impression that several speakers are talking at the same time, at different 
locations in the room. These prosodic imperfections are often disguised by 
inserting pauses, which are clearly audible and make the speech sound less 
natural. As far as the differences in loudness are concerned, these can be 
remedied by manipulating the overall energy of the material after recording 
without loss in quality. Differences in rhythm and pitch patterns are more 
difficult to correct. PSOLA manipulation only works for some voices without 
deterioration of the speech quality. 

2. The words that serve as slot fillers are recorded in one prosodically neutral 
version only. This makes it practically impossible to exploit the two most 
important features of intonation: 

(a) Highlighting informational structure by means of accentuation, i.e. by 
accenting important and new information, while deaccenting old or given 
information. 

(b) Highlighting linguistic structure by means of prosodic phrasing, i.e. by 
melodically marking certain syntactic boundaries and by using pauses 
at the appropriate places. 

One simple application that takes the prosodic properties into account is a tele- 
phone number announcement system described in Waterworth (1983). In order 
to increase the naturalness of the long number strings, they are split into smaller 
chunks. Digits are recorded in three versions with different intonation contours. 
There is a neutral form , a terminator , with a falling pitch contour, and a continu- 
ant, with a generally rising pitch. Experiments showed that people preferred this 
method over the simple concatenation method. 

Another application called Appeal, which is a computer-assisted language learn- 
ing program, uses a more sophisticated form of word concatenation to deal with 
prosodic variations (de Pijper 1997). The words have been recorded embedded in 
carrier sentences to do justice to the fact that words are shorter and often more 
reduced when spoken in context. The duration and pitch of the words are adapted 
to the context using the PSOLA technique. This ensures a natural prosody, but 
the coding scheme may deteriorate the quality of the output speech to some extent. 

3 Speech output generation in GoalGetter 

Our approach to concatenating words and phrases requires no manipulation or 
coding of the recordings, so the quality of the speech is not affected at that point. 
A good speech output quality is obtained by recording several prosodic variants of 
otherwise identical phrases and words. In this way, a large number of utterances 
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can be pronounced on the basis of a limited number of prerecorded phrases, saving 
memory space and increasing flexibility. This technique can be used whenever 
there is a carrier-and-slot situation, i.e., there is a limited number of types of 
utterances (carriers, templates) to be pronounced, with variable information to be 
inserted in fixed positions (slots) in those utterances. GoalGetter obviously fits 
this situation well. The carriers are the syntactic templates, and these have slots 
for variable information, such as match results, football team names, names of 
individual players, and so on. 

To determine which words and phrases have to be recorded and how many 
different prosodic realizations are needed, a thorough analysis of the material to 
be generated is a necessary phase in the development of a phrase database. 

3.1 Prosodic markers 

As mentioned before the intonation of a sentence should serve to highlight inform- 
ational and linguistic structure. In order to generate the proper pitch contour for a 
given sentence, one needs to integrate intonational, accentual and surface-syntactic 
information. The LGM has this information readily available and passes it on to 
the SGM in the form of prosodic markers. There are two basic types of mark- 
ers: accent markers and phrase boundary markers. In GoalGetter, there are also 
special, application-specific, markers. 

• Accent markers : A word can be either accented or unaccented. In the en- 
riched text, accents are indicated with a double quote (”) before the ac- 
cented word. Deaccentuation rules are based on the given-new distinction 
(van Deemter 1994). As mentioned before, proper accentuation highlights 
informational structure. Deaccentuation is necessary in GoalGetter because 
accentuating given information leads to unnatural results and can even result 
in unintended interpretations. Recently, a third type of accent, viz. contrast 
accent , has been implemented in the LGM. However, the prosodic realizations 
associated with this type of accent have not yet been included in the SGM. 
Therefore, we leave this accent type out of consideration in this paper. The 
interested reader is referred to Theune (1996) (this volume) for a discussion 
on the prediction of contrastive accent in data-to-speech generation systems. 

• Phrase boundary markers: Prosodic boundaries are indicated by slashes in 
the enriched text. The number of slashes (1, 2 or 3) denotes the strength 
of the boundary. The sentence final boundary (///) is the strongest one. 
Words which are clause-final or which precede a punctuation mark other 
than a comma are followed by a major phrase boundary (//). A minor 
boundary (/) precedes a comma and constituents to the left of an F, C’ 
or maximal projection. This is a slightly modified version of a structural 
condition proposed by Dirksen and Quene (1993). 

In longer texts, containing more complicated constructions, one might want 
to distinguish more levels. Sanderman (1996) uses five levels for generating 
texts with more natural phrasing. 
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• Special markers: The symbols % and @ are used to trigger particular appli- 
cation-specific prosodic realizations not immediately related to accentuation 
and boundary marking. They are only used in the phrase concatenation 
mode. In order to use them in the diphone mode we need robust rules that 
specify how these special prosodic versions are realized, which are unavailable 
at the moment. The @-sign is used to mark the numbers reflecting the score. 
This is because in Dutch the score of a match is pronounced in a special way: 
the two accented numbers are realized with a so-called flat hat (a steep rise 
on the first accented word and a steep fall on the second one, with high pitch 
in between), which in Dutch is normally used only if there is no intervening 
boundary. The fact that the first accented word is lengthened and that a 
small pause seems appropriate, on the other hand, suggests that a boundary 
should be there. 

The %-sign is used to mark nouns that are followed by a noun phrase func- 
tioning as an adjunct, as in de %verdediger de Boer kreeg een gele kaart ‘the 
defender de Boer got a yellow card’. The noun de verdediger can also occur in 
isolation where it has a longer duration and often receives an accent. In the 
case where it is marked with a %-sign, a different prosodic variant is chosen 
which is shorter and does not have an accent. This phenomenon seems to be 
general in Dutch and as such ought to be incorporated in the prosody rules. 



3.2 Prosodic realization 

Once the content and prosodic properties of the text is known, a phrase database 
can be developed, which provides the words and phrases that have to be concaten- 
ated. For the slot fillers, we chose to use six different prosodic realizations, one for 
each context described in terms of accentuation and phrasing attributes. Styliza- 
tions of these prosodic realizations are depicted in Figure 4. The special markers 
are not indicated in Figure 4, because they apply to a small group of words only. 



BOUNDARIES 





NONE 


MINOR / MAJOR 


FINAL 


YES 


_U 


U / 


u 




-/ 




NO 


<J 


ju 

X 


u 









Figure 4: Stylized examples of the pitch contours needed 
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The six different prosodic realizations, described in terms of the IPO Grammar 
of Intonation (’t Hart et al. 1990), are: 

1. A slot filler that is accented and does not occur before a phrase boundary 
is produced with the pitch movement that is most frequently used, the so- 
called hat pattern, which consists of an accent-lending rise and fall on the 
same syllable. This contour often corresponds to the prosodically neutral 
version that is used in straightforward concatenation techniques. Sometimes, 
the penultimate and the final accent in a sentence are combined, and instead 
of two hat patterns, one flat hat is realized. In Figure 4 this contour is 
obtained by combining the rise of (1) with the fall of (3). GoalGetter uses 
this construction mainly in time expressions that occur at the end of the 
sentence. 

2. An accented slot filler which occurs before a minor or a major phrase bound- 
ary is most often produced with a rise to mark the accent and an additional 
continuation rise to signal that there is a non-final phrase boundary. A short 
pause is added after the word. 

3. An accented slot filler which occurs in final position receives a final fall. 
A longer pause follows the word. This contour co-occurs with a rise in a 
preceding word. 

4. Unaccented slot fillers are pronounced in a neutral fashion without any pitch 
movement associated to them. 

5. Unaccented slot fillers occurring before a minor or a major phrase boundary 
only receive a small continuation rise. This type of words does not occur 
very often in the GoalGetter domain, since the LGM usually puts a minor or 
major phrase boundary immediately after an accented constituent. 

6. Unaccented slot fillers in a final position are produced with final lowering. 

When recording the material for the phrase database, the slots in the carrier sen- 
tences are filled with dummy words, so that the fixed phrases to be stored in the 
database can be excised easily. The slot fillers such as team and player names 
are embedded in dummy sentences that provide the right prosodic context. The 
sentences are constructed in such a way as to make the speaker produce the stand- 
ard prosodic realization naturally. The intonation in the fixed phrases is not very 
critical, so the speaker may use his own intuitions to determine how to pronounce 
them. 

3.3 Generating speech 

In order to make a text audible, the proper words and phrases have to be concat- 
enated by an algorithm which performs a mapping between the enriched text (with 
accentuation and phrasing markers), and the phrases that have to be selected. The 
different prosodic variants are selected on the basis of the prosodic markers. The 
algorithm recursively looks for the largest phrases to concatenate into sentences. 
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At concatenation time, the slot fillers are surrounded by short pauses of 50 ms, 
which are hardly perceivable, but which give the speech a less hasty character. Be- 
cause the slot fillers usually contain the important information, they are supposed 
to stand out slightly from the rest of the sentence, which is an additional reason 
why introducing small pauses is not disturbing. 

3.4 Selection of speaker and speaking style 

The choice of an appropriate speaker is essential for the success of the applica- 
tion. Cox and Cooper (1981) conducted a survey to find out what properties in 
a human’s voice make it suitable for use in a telephone information system. The 
results showed two important factors influencing the preferences of the listeners, 
i.e. agreeableness and assertiveness (which is also associated to the notion of self- 
confidence). In their experiments, female speakers were marked up for assertiveness 
whereas male speakers were marked down for that quality. Because of this prop- 
erty, there seemed to be a slight preference to use a female speaker in telephone 
announcement systems. 

Speaking style also constributes to the output quality of the speech. Two im- 
portant factors associated to speaking style are speaking rate and pitch range. 
When selecting a speaker, these factors have to be taken into account. A speaker 
should not speak too fast, since that gives the concatenated speech a restless, 
nervous quality. Especially small words like function words will sound as if they 
have been cut off abruptly. A speaker’s pitch range should not be too excessive, as 
disfluencies in the speech are more likely to occur. 



4 Conclusion 

This paper describes a method for speech generation in the GoalGetter system. It 
has been demonstrated that with a sophisticated phrase concatenation technique, 
we can obtain speech output with a very good quality. As mentioned before, this 
technique is only suitable when there is a stable and fairly limited application 
domain. Once the language generation module generates too flexible output and 
the slot fillers change continuously, the phrase concatenation technique will prove 
to be too inflexible. Therefore, we are continuing our efforts to improve the diphone 
synthesis technique. 
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Possessive affixes and complement composition 



Dimitra Kolliakou ** 



Abstract 

A long-standing issue in the literature on clitics, namely, whether they can 
be best analysed as affixes or syntactically autonomous words (postlexical 
clitics), is here addressed with respect to the Modern Greek ‘weak form’ 
possessive pronoun. It is argued that distributional and phonological evid- 
ence strongly support an affixal analysis. Apparent difficulties for extending 
to possessive affixes an HPSG account that has been previously employed 
for pronominal affixation in Romance VP are overcome, and a composition 
approach is proposed - one which takes a categorial grammar approach to 
adjectives and treats them as heads in NP, and does not require flat NP 
structures that lack independent motivation in Modern Greek. 



1 Introduction 

A long-standing problem in the literature on clitics has been the issue of whether 
they can be best analysed as affixes or as syntactically autonomous words. Ac- 
cording to one proposal, cf. e.g. Anderson (1992), all ‘clitics’ are phrasal affixes 
and should not be assigned the status of nodes in syntactic markers at all. An- 
derson discusses data from a number of languages and shows that there are very 
substantial similarities between the principles governing the placement of ‘clitics’ 
and those for the placement of affixes. An affixal approach to clitics will permit 
such generalizations to be expressed, and, moreover, dispense with, in his view, ad 
hoc syntactic categories such as clitic or particle. According to an alternative view, 
cf. e.g. Zwicky and Pullum (1983) and for a recent discussion Halpern (1995), it is 
essential to distinguish between (a) affixal clitics that are lexically attached, and 
(b) postlexical clitics (PLC) that have the syntax of phrases but prosodically are 
part of a Clitic Group - a prosodic category that is bigger than the phonological 
word and smaller than the phonological phrase , and which consists of a phonolo- 
gical word (host) and one or more clitics, cf. Nespor and Vogel (1986). If the view 
represented by Anderson is right, then what remains to be done is to work out the 
specifics for each particular family of affixes. Otherwise, the issue of affix versus 
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PLC status is a burning one. I start therefore by considering this issue with re- 
spect to the Modern Greek ‘weak form’ possessive pronoun, henceforth, MG POSS 
(Sections 2 and 3). Given that I conclude that MG POSS is an affix, I do provide a 
detailed account of its morphosyntax. Thus, in Section 4 I show how a composition 
approach, in the spirit of previous work on pronominal affixation couched in the 
framework of Head-driven Phrase Structure Grammar (HPSG), can be extended 
to MG POSS, despite apparent empirical and conceptual difficulties for extending 
to NP an approach originally intended for the placement of pronominal affixes in 
Romance VP. 



2 Basic data and previous approaches 

By way of introduction, I mention a couple of essential and undisputed facts about 
MG POSS. First, MG POSS is an enclitic or suffix, as can be demonstrated by 
evidence from stress. It has often been observed that in MG lexical stress is allowed 
on any one of the last three syllables of a word but no further to the left (Stress Well- 
Formedness Condition (SWFC)). If a potential host for POSS is stressed on the 
antepenultimate, as e.g. kaliteros (‘best’), once POSS is attached, a stress is added 
two syllables to the right, as in kaliteros in (1), to satisfy SWFC; the additional 
stress is in fact perceived as the main stress of the word, whereas the original lexical 
stress weakens, cf. Arvaniti (1992). 

(1) o kaliteros MU filos 

the best POSS.lsg friend 
‘my best friend’ 

Second, POSS exhibits a ‘floating’ distribution: it can attach to a specifier (2a), 
any prenominal adjective (2b-c), or the noun (2d). However, multiple possessive 
marking, as e.g. in (2e) is not allowed. 

(2) a. ola TUS ta-prosfata epistimonika arthra 
all POSS.3pl the-recent scientific papers 
‘all their recent scientific papers’ 

b. ola ta-prosfata TUS epistimonika arthra 

c. ola ta-prosfata epistimonika TUS arthra 

d. ola ta-prosfata epistimonika arthra TUS 

e. *ola TUS ta-prosfata epistimonika arthra TUS 

Previous approaches treat MG POSS as an affix. Sadock (1991) proposes that it 
is a penultimate word suffix. Nonetheless, as pointed out by Halpern (1995), this 
proposal cannot account for the patterns illustrated in (2a&b). In a similar spirit, 
Stavrou and Horrocks (1990) argue that MG POSS is a sister of N° that in D- 
structure attaches to N° or Adj° by means of ‘a morphological rule that can apply 
at the syntactic level’, and which yields lexical (X°) categories. However, unlike 
Sadock’s autolexical syntax approach, Stavrou and Horrocks make no explicit pro- 
posal concerning the interface between syntax and morphology. To the best of my 
knowledge, no PLC analysis of MG POSS has been previously proposed. However, 
a prosodic approach to the Ancient Greek possessive enclitic (AG POSS) which 
assigns it PLC status is provided by Taylor (1996). I summarize this proposal and 
consider whether it can be extended to MG. 
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The following data (from texts dating back to 0-300 AD) demonstrate that AG 
POSS can appear in two positions: (a) the NP-initial position, henceforth, 1W 
(first- word-position), as shown in (3a&b), and (b) in second-position, in fact, the 
position after the first word (position 2W a la Halpern (1995)), as shown in (3c&d). 

(3) a. kai peisthesontai [a rp SOU tais-rhemasin] 

and they- will-trust [np POSS.2sg the-D(AT) words-D(AT)] 

‘and they will trust your words’ 

b. ean [np MOU ten-entolen] phulaksei 

if [np POSS.lsg the-A(CC) commandment-A(CC)] he-keeps 
‘if he keeps my commandment’ 

c. aph’ hes [v elalesas] autois [np tas-entolas MOU] 

from when you-told them-D [yyp the- A commandments- A POSS.lsg] 

‘from the time when you told them my commandments’ 

d. metelabon hoti bareos douleuete [np ten-kurian HEMON meteran] 

I-understand that grudgingly you-serve [np the- A lady- A POSS.lpl mother- A] 

‘I understand that you serve our lady mother grudgingly’ 

Following a tradition which assumes that 1W and 2W are related and derives 
the latter from the former in a ‘post-syntactic’ component of the grammar, Taylor 
provides a prosodic account of the 1W/2W alternation: AG POSS is taken to be a 
simple clitic in the sense of Zwicky (1985), which, however, is sensitive to phonolo- 
gical phrase (4>) boundaries. As shown in (4a-b) below, AG POSS is syntactically 
left-adjoined to NP and prosodically attaches to a linearly preceding host outside its 
domain, due to its enclitic status, thus giving rise to syntax-phonology mismatches 
of the type discussed in Klavans (1985). In case a non-optional boundary (‘#’) 
precedes the clitic, i.e. one that cannot be eliminated by restructuring in the 
sense of Nespor and Vogel (1986), as e.g. in case of (4c-d), Prosodic Inversion is 
triggered (cf. Halpern (1995), Taylor (1996)), which essentially amounts to allowing 
the linear order of host and clitic inside the Clitic Group to be the reverse of their 
linear order in the syntax. (‘=’ marks prosodic attachment.) 

(4) a. [yp [v peisthesontai] [np [cl SOU] [np tais-rhemasin]]] 

b. ($> peisthesontai =SOU tais-rhemasin) 

c. [vp [v elalesas] [np autois] [np [cl MOU] [np tas-entolas]]] 

d. (($ elalesas) ($ autois)# ($> tas-entolas =MOU)) 

A crucial difference between AG and MG POSS is that the latter never attaches 
to a host outside its syntactic domain: (3a&b) are not available options in MG. To 
start with, this is a distributional pattern that pertains to affixes but not PLCs, 
as is argued at length in Halpern (1995). In addition, unless it could be shown 
that boundaries are obligatory in MG in all contexts where Taylor takes them 
to be optional for AG, this distribution requires one to assume that the syntactic 
domain of MG POSS is never a maximal projection (NP), but rather the N'. 1 This 
would nonetheless go contrary to most of the literature which exclusively allows 
maximal categories to constitute the syntactic domain or scope of clitics. Let us 
for the moment ignore these two initial difficulties and assume that MG POSS is 

besides, unless we take the N' as the domain, we cannot account for examples such as (2c) 
which are clearly not instances of second-position in NP, but rather instances of second-position 
(2W) in N'. 
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syntactically left-adjoined to N' and prosodically attaches to a linearly preceding 
host, due to its enclitic status. Such a proposal will account for examples (2a-c), 
which conform to the pattern [X [/w POSS [/y/ Y Z]]], i.e. they contain a clitic in N'- 
initial position. 2 Moreover, the requirement for left as opposed to right adjunction 
will allow us to account for an additional fact, illustrated in (5), namely, that 
postnominal adjectives cannot host POSS: 

(5) a. ena arthro TU prosfato 

one paper POSS.3.masc/neut.sg recent (‘a recent paper of his’) 
b. *ena arthro prosfato TU 

To account for 2W, e.g. (2d), which under Taylor’s proposal would be derived from 
a syntactic structure [w epistimonika [/y/ TUS [w arthra]]] by applying prosodic 
inversion, we must further assume that words can constitute (optional) phonolo- 
gical boundaries in MG NP - an assumption that has not previously been made. It 
can perhaps be argued that there are alternative ways for accounting for 1W/2W 
alternations, e.g. by assuming that (2a-d) are all instances of discontinuous con- 
stituency, where a possessive clitic syntactically combines with an NP or N', is 
permitted to interleave with its daughters, and appears after the leftmost daughter 
in NP or N' - a specifier, adjective, or the noun. In the next section, I argue that 
the PLC approach to MG POSS, no matter whether prosodic or syntactic, will 
encounter a number of distributional problems. 



3 Arguing for an affixal approach 

A first problem for an analysis of MG POSS as a postlexical clitic is the existence 
of unexpected gaps in 1W and 2W. Along with the grammatical (6a), a prosodic 
or syntactic analysis along the lines of the one sketched above will allow for the 
ill-formed (6b), with MG POSS in 1W, following the complement of a preceding 
adjective. (6b) cannot be excluded by appealing to an obligatory phonological 
phrase boundary to the left of tu since it is generally agreed that material inside 
the NP that precedes the noun plus the noun itself form a single phonological 
phrase. 3 

(6) a. ta gnosta [se olus]] kolpa TU 

the familiar to everybody tricks POSS.3.m/n.sg 
(‘his tricks familiar to everybody’) 

b. *ta gnosta [se olus]] TU kolpa 

An approach a la Taylor would also permit the ill-formed (7b) to be derived by 
prosodic inversion from (7a), where POSS is syntactically left-adjoined to the N' 
whose first word is the adverbial entelos. Though a syntactic PLC approach could 
perhaps rule out (7b) by constraining MG POSS to exclusively interleave with 

2 In (2a), ola is taken to be the specifier, whereas ta is part of the N. This is consistent with 
the fact that the MG definite article does not shift the type of the phrase it occurs in from N' to 
NP, rather it is a proclitic/prefix that can multiply occur in the same NP, as in (15) below. 

3 Neither could it be argued that MG POSS for some reason requires a strictly lexical host and 
resists a preceding phrase: not only does this assumption violate the spirit of the PLC approach, 
but it is also empirically refuted by the gram mat icality of, say, (ta) [ap para poli gnosta] TU kolpa 
(lit.: the very much familiar POss.3.masc.sg tricks; ‘his very familiar tricks), where tu is again 
preceded by an AP which this time consists of an adjective and a pre- adjectival degree modifier. 



Kolliakou 



73 



the daughters of an N'/NP, rather than the daughters of an AP embedded inside 
that N'/NP, this latter type of approach would fail to allow for the grammatical 
(7c). In (7c), tu is again embedded inside an AP, but this time it is attached to the 
adjective, whereas the modifier occurs in post-head position. The contrast between 
(7b&c) shows that APs cannot be homogeneously treated as syntactic boundaries 
for the purposes of clitic placement. 

(7) a. orismenes TU [#/ [ap entelos katestramenes] tixografies] 

certain POSS.3.m/n.sg totally ruined frescos (‘some of its totally ruined frescos’) 

b. *orismenes [ap entelos TU katestramenes] tixografies 

c. [ap to apagorevmeno TU dia nomu] vivlio 

the forbidden POSS.3m.sg by law book (‘his forbidden by law book’) 

The distribution patterns in (6) and (7) (as well as the example in footnote (3)) can 
be straightforwardly accounted for if MG POSS is viewed as part of the morphology 
of nominal categories (determiners e.g. orismenes, adjectives e.g. apagorevmeno, and 
nouns e.g. kolpa). In terms of Miller (1992) and Halpern (1995), this amounts to 
treating POSS as ‘extended’ inflection that exhibits head percolation , that is, if it 
is to be located inside an AP daughter of N', it will be attached to the adjective 
head, if it is to be located inside the specifier, it will be attached to its determiner 
head, and so on; therefore, it cannot be found in the right or left edge of a daughter 
of N'/NP, as in the ungrammatical (6b) and (7b), respectively. 4 

A second problem for the PLC approach is the fixed order of POSS and NP- 
internal demonstratives - the latter must always follow the former inside NP, as 
shown in (8). It is commonly assumed that NP-internal demonstratives in MG have 
the syntactic status of adjectives. Given this, the PLC approach would predict 
that both (8a&b) should be grammatical, in either case tu being syntactically left- 
adjoined to an N', and prosodically attached to a preceding adjective. An affixal 
approach on the other hand can circumvent this problem. Affixation is a lexical 
matter, therefore, there can be exceptions in a given morphological paradigm. 
NP-internal demonstratives can be treated as such an exception in that, unlike 
other adjectives, they do not participate in the morphological process of possessive 
affixation. Further such exceptions will be provided below. 

(8) a. ta-kenuria TU afta vivlia 

the-new POSS.3sg these books (‘these new books of his’) 

b. *ta-kenuria afta TU vivlia 

the-new these POSS.3sg books 

A third problem for a PLC analysis is a type of obligatory ‘possessive doubling’ 
that mysteriously applies only in case of first and second person, but not in third. 
As illustrated in (9) below, a first person singular phrasal possessive can occur in 
an NP where a possessive affix is also present, but is not licit otherwise. Note also 

4 A potential counterexample for the ‘head percolation’ generalization is the fact that POSS 
cannot intervene between an adjective and its phrasal complement, e.g. the string *ta gnosta TU se 
olous kolpa is ill-formed - compare with the grammatical (6a). Notice, however, that this example 
would also be a problem for the prosodic inversion approach discussed above, and, moreover, 
for a syntactic approach assuming discontinuous constituency and which would account for the 
grammatical (7c). Both types of account would allow for the ‘surface’ ordering [Y [ap Adj 
POSS XP] Z ... }. The composition-based proposal in Section 4 provides a tentative solution to 
this problem by assigning possessive morphology to adjectives which do not subcategorize for a 
thematic complement. 
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that despite the fact that phrasal possessives in MG are NPs in genitive case, when 
in first or second person singular and plural, they occur in accusative, since the 
first and second person paradigm is defective and no genitive forms are available 
synch ronically. It is unclear how a PLC account can relate the adjoined-to-N' 
POSS with a phrasal possessive that presumably occupies the noun’s complement 
position, and in fact, in such a way that only first and second person elements 
are affected. In Section 4, I propose a way for accounting for such idiosyncratic 
doubling under an affixal approach to POSS based on composition. 

(9) a. *to vivlio emena 

the book mine- ACC (putatively: ‘my book’) 

b. to vivlio MU (emena) 

the book POSS.lsg mine-ACC (‘my book’) 

Consider now some phonological evidence in favour of an affixal approach. In 
MG, two phonological rules can be identified whose domain of application is the 
word, where ‘word’ is to be interpreted either as a plain inflectional form, or as 
an inflectional form plus possessive suffix combination. First, SWFC (see above) 
affects lexical stress in an entirely predictable way (a) in case of inflectional affix- 
ation, where the main stress moves one syllable to the right, as e.g. in mathima 
(lesson-NOM/ ACC.SG) — ► mathfmatos (lesson- GEN. SG), and (b) in what has been 
traditionally referred to as cliticization , here analysed as ‘extended’ or phrasal af- 
fixation. Though in (a) and (b) lexical stress is affected in two different ways - 
recall that in case of possessive affixation a new stress (main stress) is added two 
syllables to the right of the original lexical stress, whereas the latter weakens - this 
difference can be accommodated in the account proposed below which identifies 
two types of morphology for a given nominal word: plain morphology, which cor- 
responds to inflected forms, and clitic morphology, which corresponds to inflected 
forms that also bear a possessive suffix, ‘clitic’ bearing no theoretical significance 
in this piece of terminology. The second rule is the voicing of a stop when it is 
preceded by a nasal. Stop Voicing (SV) applies inside a plain morphology word, 
as e.g. in case of avriOearj (‘antithesis’) — > [andithesi], and, moreover, inside a 
clitic morphology word, as e.g. in KaOrj^rjTUJv rov (professors POSS-3.meisc.sg; ‘his 
professors’) — > [kathigiton du], but not across words. 

A further piece of evidence in favour of an affixal analysis is the existence of 
certain exceptions or ‘arbitrary gaps’ in the set of ‘host’-POSS combinations (cf. 
Zwicky and Pullum (1983)). For example, though POSS can occur inside indefinite 
NPs, as has been previously demonstrated in (5a), particular members of the de- 
terminer class appear to resist a possessive suffix. My consultants agree that there 
is a contrast between (lOa&b), despite that fact that merikes and orismenes do not 
appear to have different properties otherwise, e.g. they can both occur in the par- 
titive construction (merikes/orismenes apo tis fotografies TU; ‘some/certain of his 
pictures’), and neither of the two is licit inside definite NPs, unlike e.g. the car- 
dinals (i-tris/*i-merikes/*i-orismenes fotografies; ‘the three/*the-some/*the-certain 
pictures’). 5 

5 POSS can function as ‘object of comparison’ for a number of comparative adjectives, which 
are not discussed here, due to space limitations. There too, the potential for possessive affixation 
is lexically determined given arbitrary contrasts such as megaliteros ap’afton / megaliteros TU 
(‘older than him’ / ‘older P0SS.3.masc.sg’), versus spudeoteros ap’afton / *spudeoteros TU (more 
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(10) a. ??merikes TU fotografies 

some POSS.3.m/n.sg pictures (putatively: ‘some of his pictures’) 

b. orismenes TU fotografies 

certain POSS.3.m/n.sg pictures (‘certain of his pictures’) 

Finally, a few words about the coordination diagnostic are in order. Potential for 
wide scope over a coordination of hosts is taken to support a PLC approach, and 
vice versa, cf. Miller (1992). E.g. the ill-formedness of *Pierre les voit et ecoute, 
as opposed to Pierre les voit et les ecoute (‘P. sees them and hears them’) argues 
in favour of affix status for French VP ‘clitics’ such as les. Unfortunately, the 
same test cannot provide conclusive evidence in case of MG POSS, since the latter, 
unlike VP pronominal affixes, is not mandatory. Though for some speakers o- 
kathigitis ke i-sinaderfos MU (the-professor-MASC and the-colleague-FEM POSS.lsg) 
can be assigned either of the readings ‘the professor and my colleague’ (preferred 
reading) and ‘my professor and my colleague’, this does not unambiguously indicate 
that POSS can take wide scope. Rather, usage of an NP that does not contain 
a possessive (e.g. o kathigitis) can often imply the existence of a ‘possessor’ (in 
this case, student of the professor), even outside coordination contexts with the 
rightmost conjunct bearing POSS. Even if we were to assume that the reading ‘my 
professor and my colleague’ can only be due to the possessive suffix’s taking wide 
scope, that still wouldn’t commit us to the PLC analysis: as Miller has shown, the 
credibility of the coordination test varies considerably, and ‘it cannot be argued 
that an item is necessarily not an affix because it can have wide scope’ (1992:157). 
To this effect, Miller provides examples where elements for which he claims affix 
status appear to exhibit wide scope in coordination e.g. the definite and indefinite 
article in French (Le/Un collegue et ami de mon pere; ‘the/a friend and collegue 
of my father’), and shows that this is also true for elements whose affixhood is 
undisputed and is also reflected in the orthographic tradition, as e.g. anti in C'est 
un juge anti-dommage et interets (‘He is an anti-compensation judge’). 



4 A complement composition approach 

Previous HPSG accounts of pronominal affixation (cf. e.g. Sag and Miller (1997) 
and Abeille, Godard, and Sag (1997) for French) rely on composition , a notion 
reminiscent of division categories in categorial grammar, originally incorporated 
into HPSG by Hinrichs and Nakazawa (1994). By composition, a functor (e.g. an 
auxiliary verb such as avoir ‘have’) can combine with an unsaturated argument 
(e.g. a participle such as donne ‘given’) whose valence requirements have not been 
satisfied, and also, directly , with the arguments of that participle (e.g. le livre ‘the 
book’ and a Marie ‘to Mary’). Alternatively, the arguments the auxiliary ‘inherits’ 
from the participle can be realized as affixes, which allows to account for various 
instances of the phenomenon traditionally known as clitic climbing , such as e.g. in 
le-lui- avons donne (‘we have given it to her/him’). This section reaches a conclusion 
that at first sight might appear rather surprising, namely, that an approach origin- 



important than him / *more important POSS.3.masc.sg). 
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ally proposed for VP pronominal affixes in Romance can be extended to account 
for the possessive suffix in the Modern Greek NP. 

A composition approach to possessive affixation in NP requires that both de- 
terminers and adjectives should be treated as heads. In the last decade, most work 
on the syntax of determiners proposes their being treated as the head of the phrase 
they occur in. Such accounts for instance include the seminal work of Abney (1987) 
and within the HPSG framework the proposal of Netter (1994). Recent work in 
MG is also in line with this view (cf. e.g. Stavrou (1991), and Kolliakou (1995) for 
an HPSG approach.) Assuming a head treatment of determiners a la Netter (1994), 
and unlike Pollard and Sag (1994), a ‘composition’ determiner can be assigned the 
argument structure shown in (11) below by which it can select for a lexical noun 
and ‘inherit’ the arguments of that noun. 6 ® stands for append. 

(11) Determiner (preliminary version) 



CAT 



HEAD det 

ARG-ST < N[ARG- ST [TJ > ©Q 



The argument structure in (11) allows for two possibilities: (a) the determiner’s 
combining with its arguments ‘in the syntax’, as shown in (12a), where both argu- 
ments of tria (‘three’) are typed canon(ical) - canon is a subtype of synsem and it 
signals that [T] and [ 2 ] are to be syntactically realized: signs (which are specified 
with a phonological value) are constrained to have canonical synsems, cf. Sag and 
Miller (1997), Abeille, Godard, and Sag (1997) - and (b) the determiner’s realiz- 
ing the inherited possessive argument as an affix - aff(ix) also being a subtype of 
synsem which however is never associated with an attribute PHON, to constitute a 
word or phrase; this is shown in (12b) where the determiner’s morphology consists 
of the inflected form tria (‘three’) and the possessive suffix tu (POSS.3.SG). 



( 12 ) 



DP 



ARG-ST 




tria arthra tu Ilia 

three papers of Bias’s 



b. 



DP 




h c 

ARG-ST ( [l]c anon, [£]aff ) (T][ARG-ST ( [Yjaff ) 
tria-tu arthra 



three-POSS.3sg 



papers 



6 I am here following Sag and Godard (1994) in assuming that possessives are arguments of 
nouns. 
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However, with ARG-ST being an HPSG feature of lexical heads that does not 
propagate onto phrases, this type of composition approach gives rise to very flat 
structures like (12a) which lack independent motivation in Modern Greek. Rather, 
evidence from pronominalization provides support for a hierarchical structure of 
the type illustrated by the bracketing in (13a): the partitive tus can replace a 
single NP constituent that includes the noun’s complement tu Ilia, as in (13b), 
rather than a lexical N complement of the determiner (see ill-formed (13c)). 

(13) a. [tris [fili [tu Ilia]]] (‘three friends of Ilias’) 

b. tris tus (‘three of-them’) 

c. *tris tus tu Ilia (lit. three of-them of Ilias) 

Moreover, an approach based on argument composition in the above sense will 
encounter a problem in case of nominal phrases which also contain adjectives: if 
the determiner’s ARG-ST contains a lexical N and its arguments, it is not clear 
how adjectives can fit in. To accommodate adjectives, the argument composition 
analysis would require them to be treated as heads, rather than adjuncts, when 
combining with N's, and AP mothers to ‘dominate’ noun heads and their comple- 
ments. Though such a proposal is not novel in the literature (cf. e.g., in (Abney 
1987)), it must be suitably formulated so as to accommodate the fact that ad- 
jectives can iterate, and, moreover, constrained so as to yield the right scoping in 
adverbial modification. These issues are addressed below. 

Let us start with the last issue - the feasibility of a head approach to adject- 
ives. A treatment of adjectives as heads of nominal projections is familiar from 
categorial grammar where adjectives are assigned the type NP/NP i.e. they are 
treated as functors that take an NP argument and yield another NP. In HPSG 
terms, this effect can be achieved (a) by modelling the types adjective and noun , 
which constitute appropriate values of the feature HEAD, as subtypes of a super- 
type nominal , also meant to constitute an appropriate value of HEAD, and (b) by 
assuming that phrases consisting of an adjective and an N' are of type hd-comp- 
ph (head-complement-phrase), cf. Sag (1997), or in terms of earlier HPSG work, 
they satisfy the head-complement ID-schema. The details of this proposal will be 
provided shortly. 

A variety of arguments for treating adjectives and nouns in MG as partly unified 
categories are provided in Kolliakou (1995). I will here confine myself to some mor- 
phological and syntactic evidence in support of this position. MG adjectives and 
nouns fall under the same morphological paradigms and both categories are mor- 
phologically marked for case and person/number/gender agreement. (14) provides 
one adjective and one noun example of a given morphological paradigm in MG: 

(14) a. kenuri-o (new-N(OM)/A(CC) .3SG.NEUT); podilat-o (bike-N/A.3SG.NEUT) 

b. kenuri-u (last-GEN.3SG.MASC/NEUT); podilat-u (bike-GEN.3SG.NEUT) 

c. kenuri-a (last-N/A.3PL.NEUT); podilat-a (bike-N/C.3PL.NEUT) 

d. kenuri-on (last-GEN.3PL); podilat-on (bike-GEN.3PL.NEUT) 

Syntactic overlap in the distribution of APs and NPs is manifested in at least two 
contexts. First, both adjectives/ APs and nouns/NPs can ‘host’ a definite article 
in the multiple definite marking construction, as shown in (15) below. No matter 
whether MG definite articles are to be treated as prefixes or proclitics (postlexical 
clitics), an account of their morphological or syntactic realization, respectively, 
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needs to determine appropriate ‘hosts’ which they can be affixed to or cliticized 
onto. The supertype nominal allows to generalize over the MG definite article’s 
suitable hosts. 

(15) To [iVomP (kenurio) podilato] to [notuP kokino] 

the (new) bike the red 

‘the (new) red bike’ 

Second, both adjectives and nouns can function as complements of higher heads in 
‘canonical’ and ‘elliptical’ contexts. (16a) below shows that in MG a ‘bare’ singular 
count term can function as a ‘maximal nominal projection’, one which saturates 
a subcategorization requirement of a verb head. In the elliptical context of (16b), 
the same verb satisfies its valence requirement for an accusative direct object by 
solely combining with an adjective. Similarly, a determiner selects for a nominal 
complement which can be instantiated either as a noun or as an adjective, as shown 
in (16c&d), respectively. Modelling adjectives and nouns as subtypes of nominal 
- the category of both verb and determiner complements in MG 7 - will enable 
a unified account of the syntax of ‘canonical’ and ‘elliptical’ constructions to be 
provided - one that does not posit phonologically null noun heads in case e.g. of 
(16b&d), and is in the spirit of semantic approaches to ellipsis resolution that do 
not assume reconstruction in the syntax (e.g. Dalrymple, Shieber, and Perreira 
(1991)). 

(16) a. Agorasa [womP (kenurio) vivlio] 

bought- l.SG (new) book 

‘I bought a (new) book.’ 

b. Exasa to vivlio mu ki agorasa [xomP kenurio]. 

lost-l.SG the book POSS.l.sg and bought- l.SG new 

‘I lost my book and bought a new one’ 

c. Vrikes kanena [jv om p isitirio?] 
get-2. SG any ticket? 

‘Did you get a ticket?’ 

d. I times ton isitirion pikilun. Kita na vris kanena [jv om p ftino]. 

the prices of the tickets vary. try to get any cheap 

‘The prices of the tickets vary. Try to get a cheap one.’ 

(17) illustrates the proposed hierarchy of parts-of-speech (p-o-s) which includes 
nominal 

(17) Parts-of-speechs (p-o-s): 



p — o — s 




nom verb prep det 



adj noun 

The feature structure in (18) below with a preliminary ARG-ST value illustrates 
an adjective selecting for an N' complement, N' being an abbreviation for objects 

7 Of course MG verbs also take DP complements. A slightly different proposal which accom- 
modates this fact is provided in Kolliakou (1995). 
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specified HEAD nominal and whose valence features do not contain any canon- 
ical elements, i.e. their subcategorization requirements have already been satisfied 
(but see below). (18) will permit phrases containing no, one or more adjectives, 
the noun, and the noun’s phrasal complements (if any) to syntactically combine 
with an adjective head which will in turn head a Nominal Phrase. 8 Case concord 
and agreement in person/number/gender between the adjective head and its N' 
complement is straightforwardly accounted for by structure-sharing ([TJ and |~2~| , 
respectively.) The RESTR value of the adjective is specified exactly as in the ad- 
junct approach to adjectives of Pollard and Sag (1994), by lexically unioning the 
set of psoas [ 3 ] contributed by the N' with the adjective’s own psoa. 



(18) Adjective as head (with preliminary version of ARG-ST value): 

PRD - 



SS I LOC 



CAT 



HEAD adj 



CASE [T] _ 



ARG-ST < N' 



CASE ] l| 
INDEXp] 
RESTR [31 



CONT 



INDEX [T] 
REST R psoa U [ 3 ] 



Note furthermore that a head approach to adjectives couched in HPSG will not 
encounter any particular problems with constraining adverbial scope over the ad- 
jective alone as e.g. in (19b) below, rather than over the AP, as in the incorrect 
(19c). HPSG does not require a one-to-one mapping between syntactic structure 
and scope. In fact, as shown by Kasper (1996), the ‘traditional’ HPSG analysis 
of adjectives as adjuncts of N' provided by Pollard and Sag (1994) makes wrong 
predictions for recursive modification, i.e. it will incorrectly derive (19c) as the 
content of [[apparently indifferent] behaviour]. A treatment of adverbials as on a 
par with complements with respect to subcategorization, as e.g. proposed by van 
Noord and Bouma (1994) and Kim and Sag (1995) among many others, appears 
to permit a correct content value to be lexically specified, one with adverbial scope 
exclusively including the psoa of the adjective head, rather than a set of psoas also 
containing those contributed by the adjective’s N' complement. The details of such 
a proposal cannot be discussed here due to space limitations. 



(19) a. syntax: [apparently [ ap indifferent [behaviour]]] 



®This proposal as it currently stands will allow for ‘elliptical’ nominals with recursive adjectives 
e.g. (??)Agorases [qp kanena [;vomP kenurio [/VomP ftino]]]? (lit. bought-2.SG any new cheap? 
putatively: ‘??did you buy any new cheap one?’), which are not considered to be grammatical 
by all speakers. One way for eliminating such nominals could be by distinguishing between 
adjectives and nouns, the latter being specified N+ in all instances, whereas the former being N — 
as a default, but N+ when combining with an NP complement - their complement’s specification 
overriding their own default when the two are put together. Assuming that adjective heads select 
for N+ nominal complements, elliptical examples with recursive adjectives will thus be ruled out. 
A shortcoming of this proposal is however that it appeals to a notion of default which has not 
hitherto been commonly assumed in HPSG work, but see order independent default unification 
by Lascarides, Briscoe, Asher, and Copestake (1996). Thanks to Ivan Sag for pointing this out 
as an issue. 
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b. content: behaviour' (x) A apparent' (indifferent' (x)) 

c. incorrect content: apparent' (behaviour' (x) A indifferent' (x)) 

I will now present a composition approach which (a) is similar to the one proposed 
for French causative constructions by Abeille, Godard, and Sag (1997) in that 
the ‘composed’ elements are members of COMPS rather than ARG-ST, and (b) 
differs from the composition approaches to pronominal affixation in Romance in 
that it maintains hierarchical structuring by exclusively permitting affix members 
of COMPS to be ‘inherited’ by ‘composition’ heads. 

Following Sag and Miller (1997) and Abeille, Godard, and Sag (1997) for verbs 
in Romance, I assume two types of nominal words: plain-nominal-word (pl-nom- 
wd) and clitic-nominal-word (cl-nom-wd). The phonology value of pl-nom-wd is 
the basic inflected form that does not bear a possessive suffix. It is computed 
by an inflectional function (Fm) which will have to take into account the root 
value specified in the lexeme type as well as information specified inside the index 
(person/number/gender agreement.) On the other hand, the phonology value of 
cl-nom-wd in addition incorporates a possessive suffix. It is computed by a two 
argument function - a simpler version of Sag & Miller’s F praf - its first argument 
H~| being the PHON value provided by Fm, and its second argument [ 2 ] the word’s 
ARG-ST value. 







lexeme 




pi — nom — wd 


(20) 


(a) 


PHON © 

_ ARG — ST [ 2 ] _ 


(b) 


PHON < Fm©) > 
_ ARG - ST [|] 






cl — nom — wd 







(c) PHON < iW©® > 
ARG -ST [J] 



Plain nominal forms are defined as having their argument structure list correspond 
to the concatenation of their valence features SPR and COMPS 9 plus a possibly 
empty list of gaps, in case some argument is being extracted. The list value of 
COMPS is unconstrained, thus allowing for plain morphology adjectives and nouns 
that contain an affix element in their COMPS list (and by unification in their ARG- 
ST list too). This is crucial for the type of composition approach proposed, one 
which is compatible with hierarchical NP structures. O stands for the shuffle 
operation. 

(21) Valence features and ARG-ST of pl-nom-wd : 
pi — nom — wd 
SPR[ <T| 

COMPS [T] 

ARG — 5T | 0 | 0 |T[ Q list(ggp) 

For clitic morphology nominal forms, on the other hand, ARG-ST comprises a 
(potentially empty) list-of-length-one of affixes on top of the SPR and COMPS 
lists (and a potentially empty list of gaps.) It should be mentioned here that the 

9 I am following Sag and Godard (1994) in assuming that the SUBJ list of nouns is always 
empty and for simplicity have omitted it from ARG-ST. Notice also that since determiners are 
not treated here as specifiers, and, moreover, MG provides for no ‘possessive determiners’ e.g. in 
English John's in John's poem, the SPR list of most nominal examples considered in this paper 
is also empty. A possible exception is the case of adverbials which can perhaps be treated as 
specifiers of adjectives, rather than complements (see discussion above). 
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only type of noun complement that can be realized in the morphology as an affix is 
one that is realized in the syntax as a genitive NP. A noun head in MG can take at 
most one phrasal genitive or affix, a fact which can be formulated as a constraint 
on ARG-ST lists of MG nous. An argument typed gen , to be realized as a phrase 
or affix, is also the leftmost argument inside the noun’s ARG-ST. Note that the 
COMPS list of cl-nom-wd is constrained to exclusively contain canonical elements. 
As will become clear below, this is crucial for preventing the multiple realization 
of a given possessive in the same NP. 

(22) Valence features and ARG-ST of cl-nom-wd : 

cl — nom — wd 

spr\q\ 

COM PS | 1 |(c anon) 

_ ARG - ST[o]® < a//, gen > ®n~l0 Hst(ggp) J 
Consider now the (revised) ARG-ST list of the MG adjective: 

(23) Argument structure of MG adjective (final version): 

[ ARG -ST < . > © < |T)[a//, gen] > © < N’[COMPS < [Tj >] > ] 

The ARG-ST value in (23) is intended to replace the preliminary ARG-ST value 
specified in (18) above. The first slot ( _ ) is reserved for the specifier. The third 
slot is occupied by an N' element whose CASE, INDEX and RESTR, omitted 
here for simplicity, relate with the CASE, INDEX and RESTR of the adjective’s 
exactly as was shown in (18). The crucial innovation is that N' is specified with 
a potentially nonempty COMPS list which is ‘inherited’ by the adjective and is 
constrained to contain elements of type affix (in fact, at most one). In case the 
‘inherited’ affix list is nonempty, two possibilities exist: (a) the affix is not present 
in the adjective’s own COMPS list, rather it is morphologically realized, in other 
words, the adjective is an instantiation of cl-nom-wd , and (b) the affix is a member 
of both ARG-ST and COMPS of the adjective head and will be ‘inherited’ by a 
higher head, the adjective thus being an instantiation of pl-nom-adj . These two 
options are illustrated in the following tree-diagram, where the irrelevant for the 
current purposes SPR list is omitted for simplicity: 

(see (2b)) 

AP 




The head-complement phrases represented in terms of branching nodes inherit the 
head- complement-phrase (hd-comp-ph) constraint: 



BEST COPY AVAILABLE 
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(24) hd-comp'ph => 

' COMPS [o] 

HD - DTR [ COMPS [o] © < \T\ . . . [n] > ] 

_ NON - HD - DTR < [SS (TJ . . . [SS Q7J > 

(24) is in the spirit of type-specific constraints on phrasal types, as in Sag (1997). 
This formulation enables the first affixal element [5] in the COMPS list of a given 
nominal to propagate to its mother’s COMPS list from where it can be inherited 
by complement composition, the phrasal complements |T| . . . fn"] being cancelled 

off in the syntax, [o] can alternatively be the empty list. 

We can now apply the same to determiner heads and ensure that they combine 
with a single phrasal complement in the syntax, thus giving rise to hierarchical 
structures. The (revised) determiner’s argument structure is as shown in (25). I 
also assume a plain and a clitic morphology determiner type, analogous to those 
provided for nominals in (20b&c) above. 

(25) Argument structure of determiner (final version): 

[ ARG-ST <\J}aff,gen]>®N , [COMPS < (T) >] ] 

Consider finally two remaining issues: (a) the requirement that post-nominal ad- 
jectives should not bear a possessive suffix, as was shown in (5) above, and (b) the 
obligatory ‘possessive doubling’ for first and second person, which was illustrated in 
(9). Postnominal adjectives with clitic morphology can be simply ruled out by the 
Left Triggering Linear Precedence (LP) Constraint in (26) which orders adjectives 
of this type to the left of their nominal syntactic sister: 

(26) Left Triggering LP Constraint: [cl-nom-wd, adj] -< [nom] 

Under the lexicalist account proposed here, ‘possessive doubling’ for first and 
second person can be accounted for by means of a lexical rule reminiscent of the 
one proposed by Sag and Miller (1997) for the ‘floating’ of tous (‘all’) from object 
NPs in French. The rule applies to a noun lexeme with a genitive argument in its 
ARG-ST list, i.e. one that can be realized as a possessive phrase or suffix. The 
resulting lexeme is just like the input, except that (a) the genitive argument is con- 
strained to be of type aff, and, moreover, its person value specified inside the index 
is typed nonthird , a novel type subsuming first and second person, and (b) there is 
an additional ARG-ST member of sort canon, to be realized as an accusative noun 
phrase, that is coindexed with (and hence agreeing with) the affix. The ARG-ST 
list of the input is constrained to contain no accusative elements (list(nonacc)) to 
prevent Possessive Doubling LR from occurring recursively. 

(27) Possessive Doubling LR: 



lexeme 




■ 


SS\LOC\CAT 


HEAD noun 

ARG — ST < | 1 \[gen] > (&list(nonacc ) 





[ ARG — ST < |~T"|(a / /, nonthird]j , [ canon , acc]i > (&list(nonacc ) ] 

5 Conclusion 

In this paper I have presented evidence in support of an affix analysis of the MG 
‘weak form’ possessive pronoun and discussed problems for a PLC approach to this 
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element in terms of prosodic inversion. I have proposed a complement composition 
account that is reminiscent of previous HPSG work in French, but differs from 
such accounts in that it maintains hierarchical structuring in NP by exclusively 
permitting affix members of complement lists to be composed by higher heads. 
This approach presupposes an analysis of both determiners and adjectives in MG 
as heads of the phrase they occur in - a hypothesis which has been entertained in 
diverse theoretical frameworks (GB and Categorial Grammar), and, in addition, is 
independently motivated in MG. 
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Presuppositions as Anaphors Revisited* 
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Abstract 

Van der Sandt’s theory of presuppositions-as-anaphors has been argued to 
be the empirically most adequate theory of presupposition projection on 
the market. One of the main differences between Van der Sandt’s approach 
and its main competitor, the ‘contextual satisfaction’ approach, lies in the 
treatment of the so-called partial match phenomenon. In this paper, we 
show that the distinction between partial and full matches should be a 
central element of any theory of presupposition projection. However, we 
also argue that Van der Sandt’s own formal theory, as it stands, does 
not offer an adequate treatment of partial matches. We then propose a 
modification of his formal theory, which will be argued to be more general, 
formally more precise, and empirically more adequate than its predecessor. 



1 Introduction 

Van der Sandt (1992) ’s theory of presuppositions has been argued to be the em- 
pirically most successful theory on this subject available today (see e.g., Beaver 
1997:983). The crux of Van der Sandt’s approach is the idea that, in many re- 
spects, presuppositions behave as anaphors. A consequence of his presuppositions- 
as-anaphors view is that the notorious projection problem for presuppositions 1 can 
be reduced to the problem of resolving anaphoric pronouns. More concretely, Van 
der Sandt argues that presuppositions can be handled using the same mechan- 
ism which resolves anaphoric pronouns in Discourse Representation Theory (DRT, 
Kamp & Reyle 1993). 

The main competitor of Van der Sandt’s approach might be dubbed the con- 
textual satisfaction approach to presuppositions , which has its roots in the work 
of Karttunen and Stalnaker, and of which Heim (1983, 1992) and Beaver (1992, 
' 1995) are the modern (i.e., dynamic) hands on the torch. The central idea of this 
approach is that the presuppositions of a sentence must be entailed by the context 

*The authors wish to thank David Beaver, Bart Geurts and Rob van der Sandt for comments 
on an earlier version of this paper, and the audiences of CLIN 96 (Eindhoven) and the Workshop 
on Definites (November 28th, 1996, Groningen) for comments and suggestions. 

HPO, Center for Research on User-System Interaction 

* Philips Research Laboraties, Eindhoven 

1 Langendoen h Savin (1971: 54): “ how [are] the presupposition and assertion of a complex 
sentence (, . . ) related to the presupposition and assertion of the clauses it contains ?” 
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of interpretation in order for this context to admit the sentence. When Van der 
Sandt (1992: 349-351) compares his approach to the contextual satisfaction ap- 
proach, he claims that the difference between the two approaches comes out most 
clearly when considering what, following Van der Sandt, might be called the partial 
match phenomenon , and of which (1) is one example. 

(1) If John has an oriental girlfriend, his girlfriend won’t be happy. 

The possessive description his girlfriend triggers the presupposition that John has a 
girlfriend. According to Van der Sandt, this example displays a genuine ambiguity 
between two readings, depending on whether his girlfriend refers to an oriental 
girlfriend or not. The two readings may be paraphrased as (2. a) and (2.b). 2 

(2) a. If John has an; oriental girlfriend, she; won’t be happy. 

b. John has a j girlfriend and if he has an; oriental girlfriend (as well), she^ 
won’t be happy. 

Van der Sandt claims that this is exactly what his theory predicts, while the satis- 
faction approach only gets the first reading; after all having an oriental girlfriend 
entails having a girlfriend. 3 However, if we apply Van der Sandt’s formal theory 
to examples such as (1), as we will do below, we find that there is a discrepancy 
between his intuitions about these partial match examples and the predictions 
made by his formal theory. In this paper we will try to resolve this discrepancy. 



2 Van der Sandt: presuppositions as anaphors 



But first, let us say something about the approach to presuppositions presented in 
Van der Sandt (1992), Consider example sentence (3), discussed by Van der Sandt 
(1992:360/1) and its representation (DRS 1). 

(3) If John has a child, his child is happy. 



(DRS 1) 



x 



x = john 



y 






child( y) 




happy {z) 


poss(x,y) 






z 








d 


child(z) 










poss(x, z) 





2 Van der Sandt (1992:350/1) provides extra evidence for this ambiguity by showing that dif- 
ferent continuations can eliminate one of the readings. Thus, continuing (1) with She has always 
been rather jealous (Van der Sandt 1992: 351) eliminates the (2. a) reading in favor for (2.b). 
Continuing (1) with But if he has one from France , . . . will eliminate the (2.b) paraphrase. 

3 This is indeed the case for the straightforward conception of the satisfaction approach. How- 
ever, Zeevat (1992:387) claims that it depends on the representation of the presupposition whether 
it is entailed or not. Zeevat does not make these ideas more precise (nor, to the best of our 
knowledge, does anyone else). 
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The consequent of the conditional contains an embedded DRS, representing the 
presupposition that John has a child, triggered by the possessive definite his child . 
We mark a DRS as presuppositional by prefixing it with a d . The d operator 
is due to Beaver (1992), but in the present paper it is only used to syntactically 
distinguish presuppositional DRSs from ordinary, assertional ones. Now Van der 
Sandt’s presupposition resolution algorithm is applied to this DRS, and starts 
looking for a suitable and accessible antecedent for the presupposition (as it would 
do for an anaphoric pronoun). Obviously, the discourse referent introduced for a 
child (i.e., y) is the ideal candidate. So, the presupposition can indeed be bound. 
Binding a presupposition goes as follows: the presuppositional DRS is removed 
from the DRS where it originates (the source DRS , for short), and merged with 
another DRS (henceforth the target DRS), namely the DRS which introduces the 
antecedent to which the presupposition is bound. Furthermore, this target DRS 
is extended with an equality condition which equates the referent introduced in 
the presuppositional DRS with the referent of the antecedent. In this way the 
anaphor is ‘absorbed’ by the antecedent (Van der Sandt 1992: 349). By binding 
the presupposition, (DRS 1) is transformed into (DRS 2), and this DRS can be 
paraphrased as if John has a child, it is happy. 
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child(y) 
poss(x, y) 




happy(y) 













A difference between presuppositions and pronouns shows up when there is no 
suitable and accessible antecedent. In that case, a presupposition can be accom- 
modated. Consider the following example with its associated DRS: 

(4) If John has an oriental girlfriend, his son is happy. 



(DRS 3) 
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x = john 
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girl f riend(y) 
poss(x,y) 
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z 






son(z) 
poss(x , z) 



Again, the resolution algorithm will look for an accessible and suitable antecedent 
to bind the presupposition that John has a son. There are two accessible ante- 
cedents (John and his oriental girlfriend) but neither can qualify as suitable. Hence 
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we accommodate the presuppositional DRS. If certain conditions are met, 4 accom- 
modation takes place in the main DRS (see Van der Sandt 1992: 345 for explana- 
tion). Technically, accommodating a presuppositional DRS amounts to removing it 
from the source-DRS and merging it with the target DRS (which — under normal 
circumstances — is the main DRS). Thus: 

(DRS 4) 



This results in a reading which may be paraphrased as John has a soni such that 
if John has an oriental girlfriend, hei is happy. As this paraphrase indicates, 
after accommodating the presupposition the resulting DRS entails that John has 
a son. In general: accommodating the presupposition in the main DRS yields a 
‘presupposing’ reading (the presupposition is projected). By contrast, from (DRS 
2) it does not follow that John has a child; the presupposition is not projected and 
this produces a ‘non-presupposing’ reading. 

It may be that there are several ways to resolve a presupposition. This brings us 
to a last, crucial ingredient of Van der Sandt’s theory: the definition of a preference 
order over permitted interpretations. Van der Sandt defines a preference order 
based on the following general principles: 

Definition 1 (Van der Sandtian preferences) 

1. Binding to a suitable antecedent is preferred over accommodation. 

2. Accommodation is preferred to occur as far from the source-DRS as possible. 

3. Binding is preferred to occur as near the source-DRS as possible. 

In most cases, these preference rules order the set of admissible resolutions in such 
a way that there is one most preferred reading. Following Van der Sandt we will 
speak of a genuine ambiguity when there is no single most preferred reading. Ac- 
cording to Van der Sandt (1992:363) partial match examples display such a genuine 
ambiguity, and he claims that this is one of the phenomena that his theory can ac- 
count for, while the satisfaction camp cannot. However, things are somewhat more 
complicated. So let us now take a closer look at the partial match phenomenon. 

4 Of which the Consistency and the Informativity constraints are the most important ones. 
Roughly, the first says that accommodating a presupposition should never lead to an inconsistent 
DRS. Similarly, the informativity constraint states that accommodating a presupposition should 
never lead to a situation in which one of the sub-DRSs becomes redundant (is not informative). 
For more details we refer to Van der Sandt (1992: 367-369). 
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3 The partial match phenomenon 

3.1 The empirical facts: four cases 

I. Antecedent is more ‘informative’ than anaphor Example (1) is a prime 
example of this category, and we fully share Van der Sandt’s intuitions that it 
displays a genuine ambiguity. The intuitions concerning example (1) might be 
a bit blurred due to a kind of lexical ambiguity in the word girlfriend . This is 
especially clear in the paraphrase of the presuppositional reading in which the 
globally accommodated girlfriend is John’s companion in life, while the oriental 
girlfriend in the antecedent is more like a mistress. However, it is not difficult to 
find examples that do not suffer from this problem, e.g., by looking at plurals. 

(5) If John has sons, his children will watch a lot of football. 

This sentence displays the same kind of ambiguity as (1). Thus (5) has a presup- 
positional reading (paraphrasable as John has children and if he has sons, then 
theyi will watch a lot of football) and a non-presuppositional reading (if John has 
sonsi, theyi will watch a lot of football). 5 

II. Anaphor and antecedent are ‘incomparable’ Consider: 

(6) a. If John has sons, his young children are happy. 

b. If John talks to some partygoers, the children will look at him in a 
strange way. 

These are ambiguous in the same way as the partial match examples discussed 
so far. Example (6.b) is ambiguous between a presupposing reading (there are 
children^ and if John meets some partygoers, theyi look at him in strange way) and 
a non-presupposing reading (if John talks to some partygoers, the children among 
them will look at him in a strange way). (6. a) displays a similar ambiguity. 

III. Anaphor and antecedent are equally ‘informative’ The examples in 
this category tend not to be genuinely ambiguous and hence they should not be 
categorized as partial matches. Consider: 

(7) If Fido sees a cat and a mouse, he’ll chase the cat and devour the mouse. 



IV. Anaphor is more ‘informative’ than antecedent Consider (8), which 
is based on an example from Zeevat (1992). 

(8) A man died in a car crash yesterday evening. The 26 year old man that 
caused the accident was found to have been drinking. 

5 Suppose the interpreter knows that due to some specific genetic peculiarity John and his 
partner can never have a girl. Given such background knowledge, the example (5) should not be 
classified in category I, but in III (anaphor and antecedent are co-extensive). This indicates that 
hearer’s knowledge should be taken into account. 
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Examples of this kind must also be categorized a s partial matches, since they con- 
stitute a genuine ambiguity. On the presuppositional reading the presupposition 
triggered by the 26 year old man who caused the accident is accommodated (i.e. , 
the 26 year old man is still alive), and on the non-presupposing reading the pre- 
supposition is bound (i.e., he is dead). 6 Both interpretations are roughly equally 
plausible, as far as we can tell. However, the distribution of such examples is 
limited: e.g., it is difficult to find conditionals which fall in category IV. Consider: 

(9) If John owns a donkey, he will be worried about the purple farmer-eating 
donkey on the loose, (after Beaver 1995:61) 

Here, the presupposing reading seems strongly preferred over the non-presupposing 
one, which is at best marginal. In other words, this sentence does not seem to be 
ambiguous in the same way as for instance example (8) is. In Krahmer (1995:165) it 
is hypothesized that identity anaphora can only add information if the antecedent 
is interpreted specifically. Let us formulate this a s follows. 7 

Informative Anaphors Hypothesis (IAH) 

A potential antecedent with a non-specific interpretation, which is 
less informative than the anaphor under consideration, does not qual- 
ify as a suitable antecedent for the anaphor, provided that the relation 
between anaphor and potential antecedent is one of identity. 

Thus: an (indentity) anaphor can only add information about its antecedent when 
the antecedent has a specific interpretation, and this would account for the fact 
that example (9) does not appear to be a genuine ambiguity. The IAH explicitly 
excludes non-identity anaphors, because it seems possible for such anaphors to add 
information about a subset of the antecedent. 

(11) If Barney owns cows, then he will feel sorry for the mad cows. 

This example indeed displays a partial match ambiguity between a non-presupposing 
reading (paraphrasable as if Barney owns cows , then he will feel sorry for the mad 
cows he owns) and a presupposing one (there are mad cowsi, and if Barney owns 
cows, then he will feel sorry for themi). 

Summarizing, examples of type I, II and IV display a partial match ambiguity. 
Of course, other factors (such as pronominal take-up in continuations or the IAH) 
may cause disambiguation. Similarly, intonation is an important factor which may 

6 Again: extra evidence of this can be given in the form of disambiguating continuations. 
Continuing (8) with The police took the drunk daredevil into custody eliminates the non- 
presuppositional reading, while continuing with This was confirmed by the pathologist who per- 
formed the post-mortem examination eliminates the presuppositional reading. 

7 There do exist some potential counter-examples to the generalization proposed in the IAH. 
Consider, for example the following ‘politically correct ’ usage of the female pronoun. 

(10) If the reader has studied example (10), she might come to the conclusion that it constitutes 
a counterexample to the IAH. 

However, we are unsure whether examples such as (10) are real counterexamples to the IAH. For 
instance, it has been argued by various people that pronouns are essentially devoid of semantic 
content (e.g., by Van der Sandt 1992), so to what extent can they add information? 
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cause disambiguation. It should be stressed however, that intonation, and in par- 
ticular accenting/de-accenting, only leads to partial disambiguation. For example, 
de-accenting the anaphor leads to a preference for binding. When the anaphor 
is accented however, this will only lead to an elimination of the identity reading 
(cf. Van Deemter 1991, 1992); both the presupposing and the non-presupposing 
reading remain possible. Thus, when children in (6.b) receives a pitch accent, the 
reading in which all partygoers are children is excluded, but otherwise the example 
is still ambiguous between the presupposing and the non-presupposing reading. 



3.2 Van der Sandt’s predictions 



I. antecedent is more ‘informative’ than anaphor Let us reconsider Van 
der Sandt’s own (1) again, and let us construct a DRS for this example. 



(DRS 5) 
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girlfriend(y) 
poss(x, y) 



happyjz) 
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girlfriend{z) 
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If we feed (DRS 5) to Van der Sandt’s resolution algorithm, it will first start looking 
for a discourse referent which is accessible and which satisfies the conditions of 
being a girlfriend, and standing in the possessive relation with John. But such 
a referent is easily found: y meets all the conditions. As we saw in section 2, 
definition 1, binding a presupposition to a suitable antecedent is preferred over 
accommodating. In the DRS we are currently discussing, it seems that y is a 
perfectly suitable and accessible antecedent, so it is unclear how Van der Sandt 
(1992)’s formalism can avoid binding the presupposition, which would make the 
non-presupposing reading (given in (2.a)) the primary reading of (1) and hence 
would predict that this example is not truly ambiguous after all. It is conceivable 
that binding is defined in such a way that y is no longer a suitable antecedent, but 
then binding is precluded and accommodation is the only option. Consequently, 
no ambiguity between binding and accommodation is predicted either. Hence, 
one might say that Van der Sandt’s formal theory does not fully implement the 
intuitions sketched in the first part of Van der Sandt (1992). 



II. anaphor and antecedent are ‘incomparable’ The same problem applies 
as in category I, and other problems apply in addition. For example, consider (6.b). 
Here is the Van der Sandtian DRS for this example. 
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(DRS 6) 
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child(Z) 





If we feed (DRS 6) to the algorithm, it will again look for an accessible, suitable 
antecedent. 8 It is unclear to us whether some partygoers is a suitable antecedent 
for the children according to Van der Sandt’s algorithm, but it yields undesired 
results either way. The situation is roughly the same as for (DRS 5): either Y (the 
partygoers) is not a suitable antecedent for Z (the presupposed children). In that 
case, the presupposition is preferably accommodated and no genuine ambiguity 
results. If, by contrast, Y (the partygoers) is a suitable antecedent for Z (the 
children), binding is preferred and, as before, no ambiguity results. But in this 
case, there is an additional problem, which has nothing to do with preferences 
between interpretations. If the presupposition gets bound, it is ‘absorbed by the 
antecedent’, and this results in a reading which may be paraphrased as if John 
meets some party going children , they’ll look at him in a strange way. This reading 
seems wrong. Binding should appear in situ , that is: the presupposition to be 
bound should not be merged with the target DRS, but with the source DRS. 9 
Summarizing, we think that the binding reading of (6.b) should be if John talks 
to some partygoers, the children among them will look at him in a strange way. 
The situation in which all the children happen to be partygoers can be viewed 
as a special case, which is typically marked by the lack of an accent on children 
(see above). Finally, the reader may easily verify that the same problems are 
encountered for case IV. 

8 We follow the notation for plurals used by Van der Sandt (1992: 370), where he explains 
how an example similar to our (6. a) should be dealt with. The capitals are discourse referents 
standing for sets of objects. All predicates in this paper are ‘strictly distributive’ in the sense 
of Kamp & Reyle (1993, 407). E.g., child(X) has the intuitive interpretation that all elements 
of X are children. In Kamp & Reyle (1993) this is denoted as child* (X). We will omit the * 
superscript where this can be done without creating confusion. 

9 Consider another example: 

(12) If John has children, he’ll spoil the little bastards. 

We are well-aware of the fact that epithets like little bastards have some peculiar properties. 
Nevertheless, they serve nicely to further illustrate the point about binding mentioned in the 
main text. If we bind the presupposition triggered by the definite description in Van der Sandt’s 
way, we end up with a reading which may be paraphrased as if John has children and they are 
little bastards, then he’ll spoil them. In other words: the children are only spoiled if they are little 
bastards. In our opinion, the right reading for this example (disregarding the differences between 
presupposed and asserted material) is something like if John has children, they’ll be little bastards 
and he’ll spoil them. 
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4 An alternative 

In the previous section (3.1) we argued that an anaphor and an antecedent stand 
in a partial match relation if the two are not co-extensive. Moreover, on the par- 
tial match interpretation, a sentence is ambiguous between a presupposing and a 
non- presupposing reading (although we have seen that certain independent factors 
may cause disambiguation). In other words, we support the intuition sketched in 
Van der Sandt (1992:349-351). However, if we apply the formal theory (i.e., the 
presupposition resolution algorithm) of Van der Sandt (1992) to the partial match 
examples (as done in 3.2), we encounter two problems: (i) the algorithm does not 
generate the required genuine ambiguity in the case of a partial match, and (ii) 
not all the binding readings are correct. 

We propose a modified version of Van der Sandt’s resolution mechanism. One 
central ingredient is the use of so-called context variables. Binding will be viewed as 
contextually restricted quantification, where the relevant context is provided by the 
anaphoric antecedent. Accommodation will be a contextually restricted variant of 
the usual accommodation procedure. To arrive at all the different possible (binding 
or accommodation) interpretations of a given sentence containing a presupposition, 
we exploit Van der Sandt’s resolution mechanism, with its use of unresolved rep- 
resentations. However, we make some modifications to the resolution mechanism 
as such, taking the notion of partial match into account by paying more attention 
to properties of potential antecedents. When antecedent and anaphor stand in a 
partial match relation, the algorithm will generate a real ambiguity. This entails 
that our modification of the algorithm yields a modified, partial preference order 
between possible interpretations. 

4.1 Preliminaries 

Van der Sandt (1992) is mostly based on the basic, first-order DRT fragment. The 
kind of examples we are interested in, and the treatment we have in mind for them, 
calls for two extensions of this basic DRT fragment. 

Plurality and quantification in DRT In the following, we adopt the basic 
treatment of plurality and quantification outlined in Kamp & Reyle (1993, ch. 4). 
Kamp & Reyle use an algebraic ‘Link-style’ interpretation of plurality, in which the 
domains contain atomic as well as non-atomic entities. Following the convention 
of Kamp & Reyle (1993), we use boldface lowercaps variables (x, y, z, . . . ) to 
•range over both individual (or atomic) referents and plural ( non-atomic ) referents. 
Lowercase variables (x,y,z, ...) are used for individual referents, and uppercase 
variables (X, Y, Z, ...) for plural referents. This convention entails that general 
definitions contain boldface referents, and actual examples do not. 

We also adopt the treatment of generalized quantifiers in Kamp & Reyle (1993, 
ch. 4) in terms of duplex conditions. In general, a generalized quantifier (which 
we shall denote as DET) is a relation between two sets of (atomic) entities, say 
A and B , and this is represented by Kamp & Reyle as a condition consisting of 
two boxes A' and B f , representing A and B respectively, separated by a capsized 
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box which contains the quantifier and the variable it applies to. The quantifier 
gets its usual interpretation as known from generalized quantifier theory (GQT; for 
technical details on quantifiers in DRT we refer to Kamp & Reyle 1993:425-427). 
Here are GQT-style definitions of singular and plural the ( d and d ' atomic): 

the S9 (A)(B) is true with respect to a model M iff 
3 deD:de AbVd* e D{d f eA^d' = d)&deB 

the pl (A)(B) is true with respect to a model M iff 
3d e D : d £ A & Vd' € D(d ' e A => d! G B) 

It is worth pointing out that Kamp & Reyle still distinguish indefinites from ‘truly’ 
quantificational determiners, and we will follow this practice, as we have done so 
far. Concretely, this means that indefinite NPs of the form DET CN, where DET is 
either a(n) y some or empty (in the case of bare plurals) introduce a fresh discourse 
referent in the current DRS. 

Context variables In Westerstahl (1985) the notion of contextually restricted 
quantification is introduced, motivated by examples such as the following: 

(13) The children were having a lot of fun. 

Clearly this is not a statement about all the children in the universe. According to 
Westerstahl, the definite determiner acts as a context indicator which signals the 
presence of a context set C (Westerstahl 1985:60) in such a way that the children 
denotes C Pi child , i.e., a contextually restricted subset of the set of all children. 

In our revision of the presuppositions-as-anaphors theory, we will use context 
variables, which we will represent as C, C", . . . These context sets are just discourse 
referents (compare Westerstahl 1985:70). Below, we let every NP introduce an or- 
dinary discourse referent and a fresh context set and our modified presupposition 
resolution algorithm explicitly operates on these context sets. It is worth to em- 
phasize that the use of context sets in this paper merely facilitates the resolution 
process. Besides introducing contextual variables, we also employ ‘contextually 
restricted’ predicates. That is, we use conditions like man c (john) which have as 
intuitive interpretation: john is a man and an element of the context set C. 10 

4.2 The presuppositions of definite descriptions 

When the DRS construction algorithm encounters a definite description, the [the 
CN] rule is activated. Here CN is the representation of CN (in singular form, 
where CN is a possibly complex common noun phrase), and z is z or Z depending 
on the number of the CN. This rule is a variant of CR.NP [Quant = +], Kamp 
& Reyle (1993: 318, 347). Definite descriptions are generally assumed to trigger 
an existence presupposition. In this rule this is modelled as follows: a definite 
description presupposes that there is some context set C which has a non-empty 
intersection with the CN denotation. 

10 Formally, if rf is a noun representation: M \= j rf C (x) iff /(x) € Im{ 7 )) FI /(C). This clause is 
a variant of cause (ii:g:i) of definition 4.3.7 of Kamp & Reyle (1993:426). 
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DET 



CN 



Rule, for DET = the 



Upon encountering an S of the form a(3 or a VP of the form 0a, 
with a a definite description (of the form the CN[± sg]), replace 
S or VP with the following presuppositional DRS and 
duplex condition, where y and z are fresh discourse referents 
and C is a fresh context variable. 




To illustrate the [the CN] rule, consider example (6.b) again. This sentence is 
represented by (drs 7). Some is indefinite: it introduces a fresh (non-atomic) 
discourse referent Y . The children is handled by our definite descriptions rule: 
it introduces a presuppositional DRS, with the intuitive interpretation that there 
is some context set C which contains children, and a duplex condition, which 
expresses that all children in this context set C look at John in a strange way. 



(drs 7) 



x — john 
Y 

partygoer(Y) 
talk(x } Y) 







4.3 The modified presupposition resolution algorithm 

When Van der Sandt’s resolution algorithm encounters a presuppositional DRS 
it will first try to bind this presupposition to an antecedent, and our modified 
algorithm will do the same. This immediately raises a question: what qualifies 
as an antecedent? The answer of Van der Sandt (1992) is simple: every suitable 
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discourse referent which is accessible from the DRS containing the presuppositional 
DRS is a potential antecedent. Van der Sandt (1992) does not specify what makes 
a referent suitable. In our opinion, the main factor in determining the suitability 
of a discourse referent is the phrase which lead to the introduction of the referent. 

(14) a. Yesterday, ani uncle of mine bumped into a 2 man. The; man fell down, 
b. Yesterday, a 2 man bumped into ani uncle of mine. The; man fell down. 

We contend that in both (14. a) and (14.b), the definite the man is strongly preferred 
to be coindexed with a man (i.e., i = 2), even though obviously both 1 and 2 are 
male persons. This is due to the fact that 1 is introduced as a man, while 2 is 
introduced as an uncle. 11 This shows that the resolution algorithm should not 
only take discourse referents into account, but also properties of the phrase which 
lead to the introduction of the referent. In particular, we are interested in the 
possible values which a discourse referent can have according to the denotation of 
the phrase with which the referent is associated. For this purpose, we will use value 
sets. For the examples in (14) it is the CN which determines the relevant value 
set. But for other phrases which lead to the introduction of a referent (e.g., proper 
names) this may be different. Consider the indefinite description a man with a 
hat , and suppose that it triggers the introduction of a discourse referent y. Then 
the value set of y in a model M and with respect to an assignment /, denoted as 
VAL (y, \[y,z\ man(y), hat(z ), with(y , z)\ Jm,/), is given by: 12 

{d E D | d E I (man) & 3d 1 E D : d f E I (hat) & (d, d ') E I (with)} 

In words: the value set of y in M is the set of men with a hat in M . Notice that in 
the case of atomic predicates P, the value set VAL(x, [P(x)J) equals the predicate 
denotation [Pj. 13 In those cases, we will use the predicate denotation as value set. 
Below we will consider pairs (x, VAL(x, |T|)) consisting of a discourse referent and 
a corresponding value set as antecedents. We are now in the position to sketch 
our modified resolution algorithm. The input of the algorithm is an underspecified 

11 Another illustration of this is the following minimal pair. 

(15) If John is looking at some [cjsj children], who play basketball, then the children will strive 
to impress him. 

(16) If John is looking at some children that play basketball], then the children will strive 
to impress him. 

The only difference between the two examples is that in (15) a referent is introduced by children 
while in (16) it is introduced by children that play basketball. Now, example (16) is ambiguous 
and (15) is not. The latter only has a non-presupposing reading; we cannot continue this example 
with They know he is a talent scout for Utah Jazz. Example (16), on the other hand, displays a 
partial match ambiguity between a presupposing and a non-presupposing reading. 

12 Reference to models and assignment functions is omitted where this can be done without 
creating confusion. 

13 In general: suppose that a phrase a leads to the introduction of a (atomic or non-atomic) 
discourse referent x. The value set of x with respect to <I> (where $ is the DRS which results 
from a) and given a model M and an assignment function / is defined as VAL(x, [$]jvf,/) = def 
{d G D\M |=/u(x,d) $} The embedding function / is only needed when $ is not a proper DRS, 
i.e., when some condition in contains a discourse referent that is not introduced in $, that is, 
if the rj phrase contains a pronoun (e.g., the man that saw him). 
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DRS containing at least one unresolved presuppositional DRS. As we have seen, 
for definite descriptions this presuppositional DRS will be of the form: 

(DRS 8) 



For each presuppositional DRS there is a list of Potential Antecedents (pa), and 
as argued above this is a list of accessible discourse referents plus their respective 
value sets. This list is ordered by nearness to the presuppositional DRS, i.e., the 
first element on the list is the nearest referent and the last element is the one 
farthest away. In general, this list will appear as follows: 14 

PA = «x x ,VAL(x x , [T X D), • • • , <Xi, VAL(x;, [T»])), . . . , <x n , VAL(x n , [T n ])» 

The modified resolution algorithm is now going to try and bind the presuppositional 
(drs 8), triggered by the definite description, to an element of the list of potential 
antecedents. We use PRESm to denote the value set of the referent associated with 
the phrase which triggers the presupposition. In the case of definite descriptions 
(as in (DRS 8)), PRESm = VAL(y, [ [y| C7V(y)] ]m )• In general, PRES equals 
VAL(y, [TJ), where T is the DRS representing the phrase which has led to the 
introduction of y. Similarly, we use ANT l M as an abbreviation of VAL(x;, [TiJ^), 
for some (x;, VAL(x;, [T^m)) £ PA. 15 

IF 3i(PRES M = ANT l M , in all H-models M) 

THEN bind 

ELSE IF Vz(PRES m n ANT l M = 0, in all H-models M) 

THEN ACCOMMODATE 
ELSE (Partial Match!) 

BIND OR ACCOMMODATE 

In words: the algorithm first checks if there is a potential antecedent with the same 
denotation as the presupposition in all H-models. If it finds one, it is a full match 
and the presupposition will be bound (both the bind and the ACCOMMODATE oper- 
ation will be defined below). If the value set of the presupposition is disjoint with 
the value sets of all potential antecedents, the presupposition is accommodated. 
The other cases are partial matches: there is no antecedent with the same value 
set as the presupposition, but there is an antecedent which matches partially, i.e., 
has a non-empty intersection with the presupposition in some H-model, then the 

14 Nearness has an obvious formal definition in terms of subordination, see Krahmer &; Van 
Deemter (1997). Instead of a list, PA should be a partial order (because several discourse referents 
may be introduced at the same level and these are ‘equally far away’ from the source-DRS), but 
we will ignore this here. 

15 It has been noted in footnote 5 that the hearer’s background knowledge may cause disam- 
biguation. This was illustrated by example (5). It was argued that if the interpreter knows that 
John and his partner do not have daughters , this example only has a non-presupposing reading. 
Therefore, our algorithm will not quantify over all possible models, but rather over all models 
which are in accordance with the interpreter’s knowledge state. For this case, the interpreter’s 
H-models (H for hearer) will not include models in which John has daughters. In what follows, 
specific hearer knowledge will not be taken into account, unless noted otherwise. 
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presupposition can either be accommodated or bound to this partially matching 
antecedent. Before we can return to our example, we have to define the notions 
BIND and ACCOMMODATE. To begin with the former, it follows from the algorithm 
that we BIND the presuppositional (DRS 8) if an antecedent (x*, ANT 1 ) £ PA has 
been found such that ANT* is either coextensive with the value set PRES (full 
match), or has a non-empty intersection with it (partial match). 

Definition 2 (bind) 

(xj, ANT 1 ) is the nearest antecedent in PA: 

1. merge the presuppositional DRS with the source DRS, and 

2. add a condition C — x* to the source DRS 

Binding is in situ (the presuppositional DRS is not moved to the target DRS, 
where x^ was introduced, as in Van der Sandt 1992). Moreover, it generalizes to 
non-identity anaphors since only the context set is equated with a set of objects, 
as illustrated for example (6.b) below. ACCOMMODATE is defined as follows: 

Definition 3 (accommodate) 

The main DRS is the (initial) target DRS: 

1. remove the presuppositional DRS from the source DRS and merge 

it with the target DRS, 

2. add a condition C — D to the target DRS 16 

3. check whether the result satisfies the Van der Sandt conditions, 

(consistency, informativity &c). If not, redo 1-3 with a new target 
DRS: the one immediately subordinated by the old target DRS 

The second clause states that the context variable C is equal to the domain of 
discourse, thereby neutralizing the effect of C. It is worth emphasizing that this is 
done to keep the differences with Van der Sandt to a minimum: it entails that our 
ACCOMMODATE is the same operation as Van der Sandt (1992)’s accommodation. 17 
Reconsider our example (6.b), and its associated (DRS 7). (DRS 7) is the input 
for our modified resolution algorithm. The list of potential antecedents for the 
presuppositional DRS looks as follows: 18 ({Y, {party g oer}) , (x, {john})). Let us 
assume that there is no specific hearer knowledge, then there will be an H-model 
M such that {party goer} M ^ {child} m- In other words: there is no full match 
between some partygoers and the children. However, there will also be an H-model 
M in which {party goer } m H {child } m / 0 (after all, children can be partygoers). 
In other words: the algorithm predicts that this is a partial match, and a genuine 
ambiguity between a binding and an accommodation reading ensues. (DRS 9) 
results when we BIND the presuppositional DRS. This DRS can be paraphrased as 
If John talks to somei partygoers, then there are children 7 among themi, and all of 

16 The constant D refers to the domain of discourse: [D]m = /m(D) = Dm- 
17 Krahmer & Van Deemter (1997) explore an alternative definition, where C is not necessarily 
equated with the entire domain, but rather with a contextually salient group of individuals. 

18 Since, VAL(K, [ [Y | party goer(Y)] ]) is equal to {partygoerj, and VAL(:c, [ [x\ x = john] ]) is 
equal to {john}, we opt for the more simple notation. 
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the children among thei party goers look at him in a strange way. And, as argued 
above, this is the correct binding interpretation. 



(drs 9) 



x = john 
Y 

partygoer(Y) 
talh(x, Y) 




The second reading comes about via a global application of ACCOMMODATE: 
(drs 10) 

XyCy Z ~ 



x — john 
child 0 (Z) 
C = D 





Y 




partygoer(Y) 
talk(x } Y) 







V 


( the 

\y 






child 0 (v) 


lookat(v, x) 



In words: There are somei children, and if John talks to some partygoers, all thesei 
children will look at him in a strange way. 19 Summarizing, if we feed the represent- 
ation of example (6.b), (drs 7), to the modified resolution algorithm, it decides that 
there is a partial match between the presupposition triggered by the children and 
its antecedent some partygoers. The corresponding ambiguity is between (drs 9) 
and (drs 10) for the non-presupposing/binding and presupposing/accommodation 
interpretation respectively. 



5 Concluding remarks 

We have seen (section 3.2) that the otherwise empirically successful formal theory 
of Van der Sandt (1992) does not always make the right predictions in cases where 

19 Under the alternative definition of ACCOMMODATE mentioned in footnote 17 the resulting 
reading can be paraphrased as there is a contextually salient group of children, and if John talks 
to some partygoers, all these children will look at him in a strange way. 
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there is a partial match between a presupposition and a potential antecedent for this 
presupposition. We think that the problems with partial matches can be solved by 
refining and extending Van der Sandt’s algorithm, and we have tried to do so. The 
resulting version of the presuppositions-as-anaphors theory differs from the one of 
Van der Sandt (1992) mainly in these respects: (1) It contains a precise definition 
of the ‘partial match’ phenomenon; (2) we have modified the resolution algorithm 
in such a way that — in accordance with Van der Sandt’s intuitions — partial match 
sentences come out as genuine ambiguities; and (3) binding is redefined in such a 
way ( f in situ) that non-identity anaphors receive adequate interpretations. In this 
paper we have opted for a frog perspective on presupposition projection, focussing 
on one kind of presupposition triggers: definite descriptions. However, in Krahmer 
& Van Deemter (1997) it is shown that there are few impediments to extending the 
approach described here: a general Noun Phrase presupposition scheme is proposed, 
applying to any NP, and it is shown that the modified resolution algorithm yields 
the required results in these cases as well. 
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Modeling Coordination by Means of Operations on 
Strings and on Derivation Trees 

Carlos Martm-Vide* 

Gheorghe Paunti 



Abstract 

Several operations on strings are introduced, a s models of the phenomenon 
of coordination in natural languages. Their relationships with other string 
operations is investigated. On this basis, the closure properties of families in 
the Chomsky hierarchy are obtained. In particular, we prove that the family 
of context-free languages is not closed under all but one of these operations. 
This special case concerns the coordination defined only between strings with 
a common syntactic structure (both strings have derivations described by 
identical trees, modulo the coordinated subwords). Some interpretations of 
these results are mentioned. 



1 Coordination; Some Variants 

The idea we start from is that in a given coordinated structure all the conjuncts 
are of the same type and status, and the coordination as a whole is of the same 
type and status as its subparts. This means that it is not possible to define only 
one head in the construction (coordination is not a projection of the conjunction, 
and there is no dependence between the two conjuncts), making it impossible to 
apply any general principle of hierarchical construction of the sentence (X-bar, for 
instance). 

Coordination is basically a recursive phenomenon, because it builds phrase 
structures (trees associated to strings) of any length. Several studies of coordina- 
tion start from the following generalization due to Chomsky, Chomsky (1957): 

If S\ and S2 are grammatical sentences, and S\ differs from 52 only in 
that X appears in S\ where Y appears in 52 (i.e., S\ = ... X .. . and 
5 2 = . . . Y . . .), and X and Y are constituents of the same type in S\ 
and 52, respectively, then 53 is a sentence, where S 3 is the result of 
replacing X by X + and 4- Y in 5i (i.e., S 3 = ... X + and + Y . . .). 

* Research Group on Mathematical Linguistics and Language Engineering (GRLMC) 

1 Institute of Mathematics of the Romanian Academy 

* Research supported by the Secretaria de Estado de Universidades e Investigation - SAB95- 
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The usual treatment of coordination phenomenon starts from the idea that two 
categories can be catenated with a conjunction giving a larger category of the same 
type. The classical rule of this description obeys the context-free requirements, 
where X can be any linguistic category or nonterminal: X — > X and X. 

This schema can produce the well-known coordination cases between equal cat- 
egories, which are common in all languages. Several problems arise if we include 
the treatment of the following cases of coordination: 

1. Coordination of unlike categories. The general schema for this kind of sen- 
tences is similar to Z —> Y and X , a s appearing in the following examples Sag, 
Gazdar, Wasow, and Weisler (1985): 

a. Pat is stupid and a liar (AP and NP). 

b. Pat is a Republican and proud of it (NP and AP). 

c. Pat is healthy and of sound mind (AP and PP). 

d. That was a rude remark and in very bad taste (NP and PP), 

where X and Y are different categories or nonterminals and Z is any category 
resulting from both X and Y . The problem in this schema is to explain how Z 
is constructed, because it is neither equal to X nor equal to Y (and therefore the 
rule is not recursive). 

2. Binary coordination between many pairs of nonterminals. The following 
sentences are typical for a language of the form {a n b 7n c n d 7n | n,m > 1 }, possessing 
crossed dependencies which cannot be produced by means of context-free rules 
without regulation: 

a. John sent a letter and a postcard to Mary and to Paul, respectively. 

b. The boys and the girls eat apples and bananas, respectively. 

c. The boys and the girls run and walk through the garden, respectively. 

d. *John and Mary sings and dances, respectively. 

3. Non-constituent coordination and gapping phenomena. English and other 
languages contain a number of coordinate constructions where the conjuncts are 
not constituents in the normal sense but are sequences of constituents. The general 
term for such constructions is non- constituent coordination : 

a. Mary studies art, John, music and Paul, history. 

b. Harry has sent a letter to Mary, and John, a postcard to Paul. 

c. Paul composed, and John posted, a letter to Mary. 

Conjunctions can be performed in ordered pairs, where the order of the elements 
is fixed. The members of a couple may be the same (o . . . o (Spanish), et . . . et 
(French)) or different (both ... and (English)). One calls binary coordination the 
structure which has two conjuncts, and multiple coordination the structure with 
more than two. This may express a restriction over the set of conjunctions: but 
cannot appear in multiple coordination, and neither the couples with different 
words, as in the case of both . . . and. All languages use this kind of structures, and 
this assumption suggests a unified treatment of the phenomenon. We try to do this 
in terms of formal operations on strings and languages. 
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2 Formal Language Prerequisites 

As usual, V * denotes the free monoid generated by the alphabet V with respect to 
the operation of concatenation; A stands for the empty string, V + = V* - {A}, and 
\x\ is the length of x G V*. The set of all prefixes of x G V* is denoted by Pref(x). 
The right derivative of a language L C V* with respect to a string x G V* is defined 
by: d£(L) = {y E V* \ yx E L}. The shuffle of two strings x,y G V* is defined by: 
x 111 y = {u\V\ . . . u n v n | n> l,x = U\ . . .u n ,y = v\ . . . v n , Ui,V{ G V*, 1 < i < n). 
For Li,L 2 C V* we write L\ 111 L 2 = U x€L uy eL 2 ( x 111 ^)* 

A context-free grammar is denoted by G = (N, T, S, P), where N is the nonter- 
minal alphabet, T is the terminal alphabet, S G N is the axiom, and P is the set 
of productions, written asA — > x, A € N, xG (TV LIT)*. The derivation relation is 
denoted by =>, its reflexive and transitive closure by =>*; the language generated 
by G is denoted by L(G). 

The families of regular, linear, context-free, context-sensitive, and recursively 
enumerable languages are denoted by PPG, LIN , CP, CS, and PP, respectively. 

A gsm (“generalized sequential machine”) is a system g = (K y Vi,V 2 ,So, P, <5), 
where K is the set of states , Vi is the input alphabet, V 2 is the output alphabet, 
So G K is the initial state , P C P is the set of final states , and <5 is a mapping 
(called transition mapping) from x Vi to the set of finite subsets of 2^ We 
extend <5 to K x Vi* as follows: 

<5(s,A) = (A,s), 

6(s,ax) = {( yx\s ') | (x',s') G <5(s",x), (?/, 5 ") G < 5 ( 5 , a)}, 
for all 5 G P, a G Vi, x G Vi*. Then, for w G Vi*, we define 

^(iu) = {z G V 2 * | ( 2 , 5 /) G <5(s o ,u0, for some sj G P}, 

ff( L ) = U 

A gsm is said to be A-free if <5(s,a) C x P, for all a G Vi, 5 G P. 

It is known that the families in the Chomsky hierarchy are closed under A-free 
gsm mappings. 

A morphism h : V* —> V* is said to have limited erasing on a language L C V* 
(in short, we say that h is limited on L) if there is a constant k such that |x| < 
k\h(x)\ for all x G L,x ^ A. 

All families in the Chomsky hierarchy are closed under limited morphisms. 

For further notions and results in formal language theory we use here, we refer 
to Rozenberg and Salomaa (1997). We only recall here the important notion of a 
derivation tree . 

Given a context-free grammar G = (P, T, 5, P), a tree r with the nodes labelled 
by elements of N U T U {A}, is a derivation tree with respect to G if: 

1. the root of r is labelled by 5, 

2. if the descendents of a node labelled by some A G N are c*i, a 2 , . . . , ajk, k > 
1,^GPUT, then the production A — > a\ a 2 . - . a* is in P, 
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3. if a node is labelled by A, then it is the only descendent of a node labelled by 
some A € N and A — > A is a production in P, 

4. the nodes labelled by elements of T U {A} have no descendents, all nodes 
labelled by elements of N have at least one descendent. 

The nodes labelled by elements of Tu{A} constitute the frontier of r (the other 
nodes - excepting the root - are said to be internal nodes); we denote by /r(r) 
the string in T* identified by the frontier (we assume r placed with the root above 
and we read fr(r) from left to right, on the frontier nodes). 

For a node v in r, we denote by t(v) the subtree of r with the root in i/, by e(z/) 
the label of v , and by /r(r(i/)) the subword of fr(r) corresponding to the frontier 



It is known that for every string w generated by a context-free grammar G there 
is a derivation tree r with respect to G such that fr(r) = w; if G is unambiguous, 
then r is unique. 

3 Two Basic Coordination Operations on Strings 

Intuitively, for two strings x,y E V* with a common prefix, x = ux\ y = uy\ the 
coordination of x, y leads to a string z = ux'y ' (or z = uy'x ', if the order is not 
relevant). We consider here two variants of this operation, depending on whether 
the common prefix is maximal or not. 

To start with, let us denote by mp(x,y) the longest common prefix of x, y: 

m P{x,y) = u iff x = ux',y = uy! and there is no u f £ V* 

such that x = u'x" ,y = u f y " and \u f \ > |u|. 

Then, the free prefix coordination of x,t/ is defined by: 

Cf p (x,y) = {uxy 1 \ x = ux\y = uy\ for some u,x / ,y / € V*}, 

whereas the maximal prefix coordination of x, y is defined by: 

Cm P {x,y) = {ux'y'} iff x = ux,y = uy' u = mp(x, y), x', y' e V*. 

Observe that Cf p (x,y) can contain several strings, that always C rnp (x 1 y) C 
Cf v (x,y) and that C mp (x,y) ^ 0 (at least A E Pre/(x)flPre/(y); if mp(x,y) = A, 
then Cm P (x,y) = xy). 

Each operation C a ,a € {/p, mp}, is extended in the natural way to languages: 



In order to settle the closure properties of languages in the Chomsky hierarchy 
with respect to operations C a , we use the following two auxiliary results, relating 
C a to known operations on languages. 



of t(v). 



C a (L)= (J C a (x,y). 
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Lemma 1. If FL is a family of languages closed under union, concatenation 
with symbols , intersection with regular languages , and right derivative, then FL 
closed under C a ,a G {fp.mp}, implies FL closed under intersection . 

Proof . For a family FL as above, consider two languages Li, L 2 6 FL, Li, L 2 C 
V*, and construct: 

L = Li{cci} U L 2 {cc 2 }, 

where c,Ci,c 2 are symbols not in V . For the regular language R = F*{ccic 2 } we 
obtain: 

Ll n L 2 — ^CCiC2 (C'a(L) n R),a G {fp>rnp}> 

(C) If x G Li nL 2 , then xi = xcc\ G Li,x 2 = xcc 2 G L 2 . Clearly, mp{x i,x 2 ) = 
xcci c 2 G C a (L) D F. Moreover, x = 3,T ClC2 (xcciC 2 ). 

(D) Take x G ^ ClC2 (C a (L) fl F). Then xccic 2 € C a (L) D F, hence there 

are xi G Li{cci},x 2 G L 2 {cc 2 } such that xccic 2 G C’ a (xi,x 2 ). Because xccic 2 
contains one occurrence of C\ and one of c 2 , we must have x x = G Li, 

and x 2 = x 2 cc 2 ,x 2 G L 2 . Because xccic 2 contains only one occurrence of c, it 
follows that mp(xi,x 2 ) = x 3 c. Consequently, x[ = x' 2 = x 3 = x. This implies that 
x G Li n L 2 . 

From the closure properties of FL, we obtain that L G FL; if C a (L) G FL, 
then also L x n L 2 G FL. 0 

Lemma 2. // FL zs a family of languages closed under shuffle, X-free gsm 
mappings, and limited morphisms, then FL is closed under C a ,a G {}p,nip}. 

Proof Let FL be a family as above and take L G FL, L C V*. Consider a new 
symbol, c £ V . For each a £ V, take a new symbol, a', and denote V' = {a 1 \ a G 
V}. Consider the morphisms 

h : K* — * V**, defined by h(a) = a , for all a G V, 

/i' : (V U {c})* — * V* , defined by h'(a) = a , for a G V, and h'(c) = A. 

Consider also the regular language 

R = {aa' \ aeV}*{c 2 }V*V’* 



and the gsm 



9 = ({so,Si,s 2 },F UF'U {c},F U {c},s 0 , {^ 2 }, ^), 

with the mapping 6 defined by 

6{s 0 ,a) = {(a,s 0 )}> a € V, 6{si,c) = {(c, s 2 )}, 

S(s 0 ,a’) = {(c,s 0 )}, a6 V, S(s 2 ,a) = {( a,s 2 )}, a e V, 

S(s 0 ,c) = {(c,si)}, S(s 2 ,a') = {(a,s 2 )}, a £ V. 

Then we obtain: 

Cjp(L) = (i'(j(((t uj M) in ML) in {c})) n fl)). (») 
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(C) If x £ Cf p (L), then x = ux'y' , for some x\ ~ ux\y\ ~ uy\ both in L. Then 
ucx ' € L 111 {c} , h(u)ch(y') £ h(L) ill {c}. If u = d x d 2 . . . a k , k > 0, a { £ V, 1 < i < 
k , then did[d 2 d 2 . . . a k a f k ccx' h(y') £ ((L III {c}) ill ( h(L ) ill {c})) n F. 

The gsm <7 works as follows: 

- scanning a prefix bid[b 2 d 2 * • -M' r G (W # )*, one produces b x cb 2 c. . . 6 r c, 

- when reading cc, these symbols are left unchanged, - 

- from now one, each a £ V remains unchanged, and each a' £ V f is replaced 
by a. 

Therefore, g{a\a , l . . . a k a f k ccx f h(y f )) = a\C . . .dkcccx'y' . Then, the morphism h! 
removes all occurrences of c, hence we get the string . . . akx’y* = x. 

(D) Take a string x in the set in the right-hand side of the (*). There are 
x\ £ L, x 2 £ L,xs £ R such that x is obtained as in (*) from X \ , x 2 , £ 3 . Denote: 

x[ = Uicx" £ L ill {c}, for xi = Uix", 

x 2 = h(u 2 )ch(x 2 ) £ h(L) ill {c}, for £2 = u 2 x 2i 

£3 = aid[ . . . dkd k ccx f 3 x ^ , for k > 0,x f 3 £ V * , £3 £ V f * . 

We must have d\d[ . . . d k d ( k £ U\ LLi h(u 2 ), hence U\ = u 2 = d \ . . . a k . Moreover, 
£1 = £3 an ^ h(x 2 ) = £ 3 . Consequently, x\ = ux'[,x 2 ~ ux 2 , for u = = u 2 , and 

£ = ux^x 2 (by the definition of g and h'). But ux f {x 2 £ C fp (x u x 2 ) C C fp (L). 

According to the closure properties of FL, we get C fp (L) £ FL (note that g 
is A-free, h! is limited, and that the closure under gsm mappings - even A-free - 
implies the closure under the intersection with regular languages). 

For the case of C mp we replace the regular language R by: 

R' = {dd* | a £ V}* {c 2 }({bud'v \ b,d £ V,u£ V*,v £ V’\b ± d) U V* U V’*). 

As above, we obtain: 

C mp {L) = h'(g(((L ill {c}) 111 (h(L) 111 {c})) n R'). 

The intersection with R' forces the selection of strings did[ . . . d k d f k ccx' h(y') as 
above, for d\ . . . d k x' £ L, d \ . . . d k y f £ L , with maximal k : either the next symbol 
(the first one in x ' and in y') is different in the two strings, or one of x\ y' is empty. 

Therefore, C mp (L) £ FL , too. 

Theorem 1 . The families REG,CS,RE are closed under C a , LIN and CF 
are not closed, a £ {fp,mp}. 

Proof. The families in the Chomksy hierarchy have the closure properties in 
Lemmas 1, 2 , but LIN,CF are not closed under intersection. 0 



4 Some Variants of the Basic Operations 

In the definition above, either any common prefix or only the maximal prefix of 
two strings is considered when coordinating the strings. This might not cover the 
case when only certain prefixes can be accepted. This is modeled by the regulated 
prefix coordination operation, defined as follows. 
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For a regular language M C V* and x,y G V*, we define: 

C rp (x , 7/) = {uxy | x = ux ,y = up', for some u, x , y G V*, u G M}. 

(Only prefixes belonging to M are taken into consideration.) 

Another natural variant is to coordinate substrings of the two strings, identifying 
both prefixes and suffixes of them. Formally, the free bilateral coordination of 
x,y £V* is defined by: 

Cfb(x,y) = {uxyv \ x = uxv,y = uyv , for some u,v,x ,y G V*}. 

Because it is not clear how the maximal bilateral coordination should be defined, 
we do not consider here this case. (For instance, consider x = abbab , y = abbbab. 
The maximal common prefix is abb , the maximal common suffix is bbab\ they 
overlap, both in x and in y!) 

We can, however, define in the usual way the regulated bilateral coordination , 
asking that both u , v in the definition above are elements of a given regular lan- 
guage; we denote by C r b this operation. 

Both Cfb, Crb can be extended in the natural way to languages. 

The next step is to iterate the operations C a ,a G {/p, mp, rp, fb, rb}, defining, 
for LCV*: 

Ca(L) = U Ci{L), 

t>0 

where 

C°(L) = L, Ci +1 (L) = C a (C* (L)), i > 0. 

It is easy to see that Lemma 1 holds true with the same proof for all operations 
C a , C*, a as above: with the notation in the proof of Lemma 1, we have: 

C* a (L) D R = C a (L) n R, a e Up, mp,fb}, 

because the intersection with R selects the strings where only one occurrence of c 
is present. Moreover, for 

R = M{ccic 2 }, 

we also cover the case of a G {rp, rb}. 

Theorem 2. The families LIN,CF are closed under none of the operations 
C a , C*, a e {fp,mp, rp, fb, rb} . 

Let us examine now the proof of Lemma 2. 

Instead of transforming a prefix a\a\ . . . ako! k of the scanned string into 
aia 2 . . . a*, the gsm g can also check whether or not aia 2 ■ . . ajt G M , for a given reg- 
ular set. (Take a finite automaton for M and simulate it on the symbols a \ , . . . , a*; 
the details are left to the reader.) Therefore, Lemma 2 holds true also for C rp . 

For the bilateral case, we modify the proof of Lemma 2 as follows: 

- consider three new symbols c\, c 2 ,c, 

- instead of R, consider the regular set: 

R’ = {aa' | a £ V}* {c\}V*V* {c\}{aa' \ a £ K}*, 
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- instead of g, consider the gsm: 

9' = ({50,51,52,53,54},^ U {c u c 2 },v U {c},s 0 , {54}, 5 '), 

with the transition mapping defined as suggested by Figure 1. 

Then, for L C V* we obtain: 

Cfb(L) = h'(g'(((L ill {cic 2 }) Hi (h(L) ill {cic 2 })) n R')). 

If the coordination is regulated by a language M € REG , then, as in the case 
of C rp , we can modify g f above in such a way to check whether or not the used 
prefix and suffix of the current strings are in M . 

Consequently, Lemma 2 holds true also for C a , a G {fb,rb}. 

Theorem 3. The families REG, C S, RE are closed under all operations C a , 
a G { fp,mp , rp, fb,rb}. 

a/a a/a a/a 



n n n 




Figure 1 



Theorem 4. The families CS,RE are closed under all operations C*, a E 
{fp,mp,rp,fb, rb}. 

Proof. For RE , the assertion is obvious (consequence of the Turing-Church 
Thesis). For C5, a straightforward (but long) construction can prove the assertion. 

Here is the idea of such a construction for Cj p . The modifications for the other 
cases are obvious: 

- Start from a context-sensitive grammar G, for a language L CV*; 

- Generate a string x € T(G); 

- Generate one more string y € L{G)\ 

- Find a common prefix of x,y; let it be u (hence x = ux\y = uy'); 

- Having x,y, construct ux'y 

- Consider ux'y' in the role of x and go to step 2. 

It is clear that the grammar G r obtained in this way generates exactly Cj p (L). 
Moreover, G' has a bounded workspace: in order to generate a string w, it uses 
at most a space of length 2|u?| (from x = ux\y = uy f we get w = ux'y ' and 
\xy\ < 2\ux'y'\). Consequently, L(G') € CS. Q 

The closure of REG under the operations C*,a G {/p, mp, /6, r6}, remains 
open. 
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5 Syntactically Grounded Coordination 

In the previous sections we have defined coordination operations on strings x,y 
without taking into account the syntactic structure of x,y. In natural languages, 
we pass from x = ux\y — uy (or x — ux’v, y — uy’v) to w = ux y (w = 
ux'y'v, respectively) only when u (u,v, respectively) has (have) the same syntactic 
structure. This makes necessary to consider the derivation trees of x,y, hence to 
define the coordination for trees, not for strings. 

Take a context-free grammar G = (N, T, S, F). 

Two derivation trees T\ , r 2 with respect to G are said to be coordinable if there 
is a node V\ in T\ and a node v 2 in r 2 such that if we remove Ti(v i) from T\ and 
r 2 (v 2 ) from r 2 and we label both v\ and v 2 with the same symbol, then we obtain 
two identical trees. 

For two coordinable trees ti,t 2 with respect to nodes v\,v 2 labelled by X, Y y 
respectively, we construct a tree 73 as follows: 

1. Excise Ti(vi) from T\. 

2. Label V\ in the remaining tree by a new nonterminal symbol, Z. 

3. Add at this node the subtree defined by the context-free rule Z — ► XY. 

4. Attach T\{yi) to the new node labelled by X and t 2 (i/ 2 ) to the new node 
labelled by Y. 

The obtained tree, 7-3, has a terminal frontier. 

If we have fr(r 1) = UiXiVi, /r(r 2 ) = u 2 x 2 v 2 , for x x = /r(ri(i/i)), x 2 = 
/r(r 2 (i/ 2 )) 5 then, because ti,t 2 are coordinable with respect to v\,v 2 , we have 
m = u 2) vi = v 2 . Then, fr(r a) = ux \x 2 v, for u = Ui = u 2 ,v = Vi = v 2 . 

Therefore, /r(r 3 ) € C f b (}r(ri), jr{r 2 )). 

We say that r 3 has been obtained by coordination from Ti,t 2 . For a grammar 
G, we denote by C(G) the language consisting of all strings /r(r), for r being a 
tree obtained by coordination from two derivation trees with respect to G. 

Note that we define G(G) using exactly one coordination for each string in 
G(G). 

Figure 2 presents the idea of coordinable trees and of coordination. 

The fact that, when coordinating (the frontier of) trees, the common parts of the 
frontiers are not only equal but they also have the same syntactic description has 
a rather powerful (and somewhat surprising) influence on the result: the operation 
.preserves context-freeness. 

Theorem 5. For every context-free grammar G, the language G(G) is context- 
free . 

Proof . If G = (AT, T, 5, F), then we construct the grammar G' = (N U {Z},T U 
{c}, 5, F'), with 

P f = F U {A — ► xZy | A — » xXy and A — > xYy € F, 
for some x, y € (AT U T)* , X, T € N } 

U {Z ^ XcY\X,Y £N). 
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Consider also the regular set: 



R = T*{c}T* 

and the morphism h : (T U {c})* — ► T* defined by h(a) = a, a G T, and h(c) = A. 
We obtain: 



C(G) = h(L(G')r\R). 
S S 




S 




Indeed, because when we coordinate two trees ti ,72 of G with respect to two 
nodes 1/1 , 1/2 the trees obtained by excising t 2 (^2) from ti,t 2 , respectively, 

are identical modulo the labelling of ^1,1/2, there is a rule A — ► used in t x 

and a rule A xYy used in r 2 (possibly X = y). Therefore, the new rules of 
P l perform a coordination operation. Removing c by the morphism /i, we get the 
result of coordination, hence C(G) C h(L(G f ) n R). Because the intersection with 
R selects from L(G') exactly those strings in whose derivation we have used only 
once a rule Z — > XcY , we have also the converse inclusion. 

From the closure properties of CF we obtain C(G) E CF. 0 
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Corollary. If G is a regular grammar, then C(G) is also regular. 

Proof. Exactly as above, starting from G regular, we construct G f . Because all 
recurrent derivations A =>* uAv in G' have u £ T*,v = A, according to Theorem 
5.5 in Salomaa (1973), it follows that C'(G) is a regular language. 0 

For linear grammar, the above results are not true. 

Theorem 6. There is a linear grammar G such that C(G) 0 LIN . 

Proof. For the grammar: 

G = ({5}, {a, b }, 5, {S - aSb , 5 - afc}), 



we clearly obtain: 



C(G) = {a i a n b n a Tn b Tn b i \ i > 0,n,m > 1}, 

which is not a linear language. 0 

In the coordination operation defined above, we allow not only to have both 
nodes v\,v<i identically labelled, but it is also possible to have = 72 (^ 2 ) (not 

to mention the weaker condition, fr{ri{v x )) = fr(r 2 (^ 2 )))* Coordinating with 
n looks artificial. Hence, we say that the coordination of 71 , 7*2 with respect to the 
nodes V\,V 2 is nontrivial if ^ 72 ( 1 / 2 ). We denote by NC(G) the language 

consisting of all strings /r(r), for r being obtained by a nontrivial coordination of 
two derivation trees with respect to G . 

The apparently small difference between usual tree coordination and nontrivial 
coordination turns out to have a surprising effect on the type of the language 
NC(G). 

Theorem 7. There is a linear grammar G such that NC(G) $ CF . 

Proof. Consider again the grammar G in the proof of Theorem 6. All derivations 
in G are of the form: 

S=>aSb=> a 2 Sb 2 =»...=> a n Sb n => a n+1 5 n+1 , n > 0. 

Thus, two subtrees of derivation trees with respect to G are different if they 
have different heights. Therefore, we have: 

NC(G) = {a t a n b n a rn b m b l \ i > 0,n, m > l,n ± m). 

Let us assume that NC(G) G CF. Consider a context-free grammar G r = 
(N,{a,b},S,P) such that L(G') = NC(G). In order to generate the prefix a 1 
and the suffix b x in strings a x a n b n a 7n b 7n b x of NC(G ), we need a derivation X =>* 
a^Xb?, j > 1,X E N. The rules used in such a derivation are not used when 
producing substrings a n b n or a m 6 m , because from X we have to generate substrings 
of the form a k a n b n a Tn b Tn b l . If in a subderivation leading to a block a n b n or a m 6 m of 
a string a l a n b n a Tri b TTl b x , after generating some a s Yb s , we introduce X from Y , then 
strings not in NC(G) will be obtained. Therefore, if we remove from P all rules 
contributing to a derivation X =>* a j XlA as above, then we obtain a grammar 
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G" generating strings of the form a ll a n b n a 7n b 7n b 12 , for all n,m > 1, n ^ m, and 
with finitely many values for i\, i 2 . Because the pairs are well specified and 

finitely many, we can replace by A each occurrence of a and b in the rules of G” which 
contribute to the prefix a 11 and to the suffix b 12 , respectively. In this way, we obtain 
a context-free grammar G nt such that L(G"') = {a n b n a 7n b 7n \ n,m > l,n ^ m}. 

However, L = {a n b n a 7n b 7n | n, m > l,n^ m } is not a context-free language. 
Assume the contrary. It follows that also L ' = {a n b n ca 7n b 7n | n,m > l,n ^ m} 
is context-free. Take a context-free grammar Go for U . All recurrent derivations 
in Go must be of the form X =>* a l Xb l ,i > 0. (Any other type of recurrent 
derivations produces strings not in a n b n a 7n b Tn , even without imposing restrictions 
on the relation between n and m.) Replace by A each occurrence of b in Go . The 
obtained grammar, Gq, contain only recurrent derivations of the form X =>* 
a l X,i > 0. According to Theorem 5.5 in Salomaa (1973), the language L(Gg) 
must be regular. However, L(G' 0 ) = h(L'), for h(a) = a,h(c) = c,h{b) = A. 
Hence, 1/(^0 ) = {a n ca rn | n,m > l,n ^ m}. This is not a regular language, a 
contradiction which concludes the proof. 

Theorem 8. There is a regular grammar G such that NC(G) £ REG. 

Proof. Consider the grammar: 

G-({5},{a,6},5,{5^a5,5-.6}). 

We obtain: 



NC(G) = {a l a n 6a m 6 | i > 0, n, m > 0, n ^ m}. 

As in the previous proof, if NC(G) is regular, then a regular grammar generating 
{a n ba 7n b \ n,m > 0, n ^ m} can be constructed, which is contradictory, because 
this language is not regular. It follows that NC(G) £ REG , too. <> 

6 Central Coordination 

The case of The boys eat apples -h The girls eat bananas —* The boys and the girls 
eat apples and bananas (respectively) suggests the following variants of coordination 
operations. 

For x,y £ V*, we define the free central coordination by: 

Cf c (x,y) = { x'y'ux'y " \ x = x'ux",y = y'uy", for some u, x' , x" , y\ y" € V"}, 

whereeis the maximal central coordination is defined by: 

Cm C {x,y) = {x'y'ux'y" \ x = x* ux'\y = y* uy" , for u, x\ x\ y' , y" £ V* , 

and there is no proper superword of 
u which is common to x and y}. 

The proof of Lemma 1 holds true also for C a , C*, for a £ {/c, me}, whereas 
the proof of Lemma 2 can be modified in order to cover the new operation (see also 
the remarks before Theorem 3). We get: 
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Theorem 9. The families REG , CS , RE are closed under C a , CS, RE are 
closed under C *, too , but LIN, CF are closed under none of these operations, 
a G {/c, me}. 

The central coordination can be naturally defined in the syntactically grounded 
variant, imposing that the substring u has the same syntactical description in both 
strings x,y. 

We denote by C c (G) the language of the frontier strings of trees obtained by 
central coordination of derivation trees in the context-free grammar G. Somewhat 
surprisingly, for this case we do not have a result like Theorem 4 above. 

Theorem 10. There is a linear grammar G such that C c {G) £ CF. 

Proof Consider the grammar: 

G = ({S, A, B,C},{a,b,c,d,e}, S,P), 

P = {S A,S -> B,A^ aAb,B -> cBd,A-+ C,B -> C,C -> e). 

All strings of the form x = a n eb n , y = c^ed™, n,m > 0, are in L(G) and they 
have the (maximal) common subword e. Moreover, in any derivation tree there is 
the subtree determined by C — ♦ e. Thus we have: 

C c (G) fl {a,c}*e{MF = {a n c rn eb n d rn \ n,m > 0}, 

a language which is not context-free. 0 

One can easily see that for each regular grammar G we have C c {G) G REG : two 
derivation trees with respect to G can have in common only a subtree corresponding 
to a final part of the associated derivations, so the central coordination is, in fact, 
a “suffix coordination” . 



7 Concluding Remarks 

The following table synthesizes the results concerning the closure properties of 
families in the Chomsky hierarchy under the (non-iterated) coordination operations 
considered above (the notations are those used before, U stands for “undefined”) 





Cf p 


Gmp 


C rp 


Cf k 


C r b 


c 


NC 


Cfc 


G me 


Cc 


REG 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 


No 


Yes 


Yes 


Yes 


LIN 


No 


No 


No 


No 


No 


No 


No 


No 


No 


No 


CF 


No 


No 


No 


No 


No 


Yes 


No 


No 


No 


No 


CS 


Yes 


Yes 


Yes 


Yes 


Yes 


u 


u 


Yes 


Yes 


u 


RE 


Yes 


Yes 


Yes 


Yes 


Yes 


u 


u 


Yes 


Yes 


u 



Several interesting conclusions can be drawn on the basis of the results in this 
table. 

With only one exception, that of unrestricted syntactically grounded coordina- 
tion, none of these operations preserves the context-free languages. 
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Coordination is a basic linguistic operation, probably present in all languages. 
We have considered here many variants, both at the surface level of strings and 
taking into consideration derivation trees, hence the structure of strings. Thus, we 
can claim, at least statistically, that we have captured the idea of coordination by 
our definitions. Consequently, we can infer: natural languages contain non- context- 
free specific constructions. Coordination is one of them. 

This does not necessarily implies that natural languages, English for instance, 
are not context-free languages in the restricted sense, as sets of strings, it merely 
means that natural languages contain constructions which cannot be handled by the 
context-free grammar formalism. 

The only case when context-free languages are preserved, that of unrestricted 
syntactically grounded coordination, can be considered as insufficiently adequate, 
as it allows trivial coordination. Significantly enough, when nontrivial coordination 
is considered, the context-freeness is again lost. Moreover, there are linear gram- 
mars leading to non-context-free languages by nontrivial syntactically grounded 
coordination. (The fact that the family of linear languages is closed under none of 
the considered operations is not a surprise, because this family is not closed under 
concatenation while coordination involves a sort of concatenation in its definition.) 

If context-free grammars are not sufficient, then what else ? This is not an easy 
question. The family of context-sensitive languages is closed under all coordination 
operations (except those syntactically grounded, which make no sense in this case as 
we have no available derivation tree). This confirms the general belief that natural 
languages lie somewhere at the context-sensitive level of the Chomsky hierarchy, 
but this is a loose conclusion: context-sensitive grammars are “too powerful”. A 
standard candidate are the Tree Adjoining Grammars (TAG), of various forms 
(with constraints, dependencies, etc). How the corresponding families of languages 
behave with respect to the previous coordination operations (on strings or on trees) 
remains as a research topic. The answer is of a definite interest for the adequacy 
of TAG’S as models of the syntax of natural language. 
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Abstract 

In this paper we describe research on compound analysis in the UPLIFT 
information retrieval project. Results of earlier experiments indicated that 
splitting up compounds in the query and forming new compounds by com- 
bining query terms improves recall while precision does not deteriorate. We 
investigated whether adding syntactic constraints to the compound splitting 
and formation processes would improve our initial results. We compared dif- 
ferent strategies for compound formation and we also investigated the effect 
of adding compound constituents as separate index terms. The results of our 
experiments show that using information about head-modifier relationships 
to create complex index terms can improve both recall and precision signi- 
ficantly but only if all constituents are also added separately. We found that 
using both noun-adjective and noun-noun head-modifier pairs produced the 
best results. 



Introduction 

The work described in this paper is part of the UPLIFT project 1 . UPLIFT in- 
vestigates whether linguistic tools can improve and extend the functionality of 
vector space text retrieval systems (cf. Salton (1989), p. 312 ff). Earlier exper- 
iments in the UPLIFT project focussed on improving recall by using stemming 
algorithms 2 . This paper describes an experiment with syntactic phrase indexing 
techniques for Dutch texts, aimed at improving precision as well as recall. The 
basic idea behind phrase indexing is that phrases characterize document content 
.more effectively than single word terms. When a single word index is used, a query 
containing the phrase information retrieval will also match with documents con- 
taining only information or retrieval. If information retrieval is recognized as a unit, 
however, these matches may be avoided or given a much lower score (depending 
on the matching strategy). Different strategies have been used to identify suitable 

’’'Utrecht Institute of Linguistics OTS 

^Netherlands Organization for Applied Scientific Research (TNO), Institute of Applied Physics 
1 UPLIFT: Utrecht Project: Using Linguistic Information for Free Text retrieval. UPLIFT 
Home Page: http://www-uilots.let.ruu.nl/ ~ uplift 
2 See Kraaij and Pohlmann (1996) for details. 
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phrases for indexing, the most important distinction being between strategies based 
on statistical co-occurrence data and strategies based on syntactic processing. So 
far, both types of strategies have proven to be equally successful (cf. e.g. Fagan 
(1987), Salton et al. (1990) and, more recently, Hull et al. (1997)). Results of 
earlier experiments in the UPLIFT project motivated us to take compounds as a 
starting point for our experimentation with phrase indexing. Our approach was 
further inspired by the work of Strzalkowski, as described in Strzalkowski (1995) 
and Strzalkowski and Perez Carballo (1996). Strzalkowski uses syntactic informa- 
tion to identify phrases in queries and documents. These phrases are subsequently 
normalized (i.e. semantically similar but syntactically different constructions, e.g. 
retrieval of information vs. information retrieval , are represented identically) as 
head-modifier pairs. Other recent work on syntactic phrase indexing includes Evans 
and Zhai (1996) and Smeaton et al. (1995). In sections 1 to 5 we describe our ap- 
proach and discuss the set-up and the results of the experiments. In section 6 we 
present the conclusions and give some possibilities for further research. 



1 Compounds and related constructions 

Earlier research in the UPLIFT project showed that when a query is expanded 
with the constituents of compounds already occurring in it 3 and new compounds 
are added to the query by combining query terms, recall improves while precision 
does not deteriorate. The following example illustrates this approach. 

Query: Ik zoek documenten over computers en natuurlijke taalverwerking 
(“I am looking for documents on computers and natural language pro- 
cessing” ) 

This query would result in the following index terms (after removal of stop words): 
document 
computer 
natuurlijk 
taalverwerking 

taal compound splitting 

verwerking ” 

computertaal compound formation 

taalcomputer ” 

In the example, the compounds computertaal (computer language) and taal- 
computer (language computer) are added to the query by combining computer and 
taal Both are valid compounds 4 but, although the second compound may retrieve 
relevant articles for this query, the first (a synonym for programming language) 
will probably retrieve many unrelated documents. 

3 In Dutch, compounds are usually written as a single orthographic unit, e.g. levensverzeker- 
ingsmaatschappij (life insurance company). As a result of this, compound constituents are nor- 
mally not considered as separate index terms. 

4 New compounds are validated using a list of all the compounds found in the document 
collection. 
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We decided to investigate whether it would be possible to improve precision 
as well by using syntactic information to constrain the compound splitting and 
compound formation processes. 

We restricted compound splitting by creating system variants which only add 
the heads or both heads and modifiers as separate index terms. To split up com- 
pounds into their constituents we used the dictionary- based compound splitter 
developed by Theo Vosse for the CORRie project (cf. Vosse (1994)). The com- 
pound splitter does not assign structure to the compound but simply yields a list 
of constituents. Identifying head-modifier relationships in compounds is not trivial 
because of possible structural ambiguities. In Dutch, compounds existing of two 
parts are usually right-headed (a fietswiel is a type of wiel) but compound construc- 
tion is recursive and both the head and the modifier can be compounds themselves, 
resulting in structural ambiguities, e.g. [[XI X2] X3] or [XI [X2 X3]]. We have not 
attempted to implement a strategy to solve all structural ambiguities in compounds 
but we have applied two different heuristics to assign probable structures. In a re- 
cent study, ter Stal (1996) found that simply assuming that all compounds have a 
left- branching structure produced ± 70% correct results. Although his results are 
for English, we decided to try this strategy. As an alternative, we also implemented 
a strategy where we use unambiguous cases collected from the corpus to confirm a 
certain choice. If we find independent evidence for a left-branching structure (XI 
modifies X2 in unambiguous contexts) or a right-branching structure (XI modifies 
X3) we select the appropriate structure. If we do not find independent evidence 
for either structure we choose a left-branching structure by default 5 . 

The formation of new compounds was restricted by using only terms which oc- 
cur in a certain syntactic context to generate new complex terms. We restricted 
compound generation to term pairs originating from complex Noun Phrases (NPs) 
containing a specific type of Prepositional Phrase (PP) (with the preposition van , 
voor or door) as a noun post-modifier. The choice for this construction was mo- 
tivated by the fact that many compounds in Dutch can be paraphrased using a 
specific type of PP, e.g. fietswiel (bicycle wheel) <-► wiel van een fiets (wheel of a bi- 
cycle), see, for instance, Geerts et al. (1984) p. 103. The term pairs were created by 
combining the head noun of the main NP with the head noun of the NP contained 
in the PP. Figure 1 illustrates this process. PP-modification structures exhibit 
similar ambiguities to the ones in complex compounds, e.g. in the man with the dog 
with the spots it is not clear whether the PP with the spots modifies the man or the 
dog. We decided to treat these structures analogously to the compound structures. 
The default strategy we adopted was to assume that PP modification structures 
are right-branching (i.e. each PP modifies the noun immediately preceding it). 
We again also implemented a second strategy, using corpus data for disambigu- 
ation. We later extended compound generation by using all PP post-modifiers and 
adjective pre-modifiers as well. 

To ensure matching, both original and new complex terms were normalized as 
head-modifier pairs. Complex constructions consisting of more than 2 constitu- 
ents are represented as several head-modifier pairs. See figure 2 for an example. 

5 A third option, where the structure is simply left ambiguous and all interpretations are 
selected, was not implemented for lack of time. 
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Furthermore, both queries and documents were treated analogously. This was not 
possible in our earlier approach (combining all the terms in a document to create 
compounds is clearly not feasible). In this way we ensured that matches between 
compounds and equivalent constructions would be given the same score as literal 
matches. 



N (HEAD) 

I 

wiel 




PP (MOD) 

PREP^^^NP 
van DET^T(HEAD) 



een 



fiets 



— ► wiel+fiets 



Figure 1: Term pair extraction from NP with PP modifier 



N 




MOD HEAD klem 



fiets wiel 



— * wield- fiets, klem+wiel 



Figure 2: Term pair extraction from complex compound 



2 Indexing module 

Based on the options described in section 1 above we developed several different 
versions of an indexing module and integrated each of these with our retrieval 
engine 6 to create different system variants. The indexing modules consist of the 
following basic sub-routines. 

A string segmentation algorithm (tokenizer) is used to identify sentence and 
word boundaries. 

A lexical look-up algorithm, based on the CELEX lexical database for Dutch 
(Baayen et al. (1993)) assigns part-of-speech tags to the words. 

A tagger is used to resolve ambiguities in tag assignment. We used the Multext 
tagger, (cf. Armstrong et al. (1995)), a Hidden Markov Model tagger, which has 

6 The retrieval engine used in the UPLIFT project is the TRU vector space engine developed 
by Philips Research (cf. Aalbersberg et al. (1991)). 
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text-* 


tokenizer 


-+ 


lexical look-up 


- 


tagger - 


pair extraction 


- 


stop words 


- 


stemmer 


- 



proper names 



NP-parser 



Figure 3; Indexing process 



the advantage that it requires only a partially disambiguated corpus for training. 
After training, the tagger produced 91.5% correct results. 

A very simple heuristic based on the distinction upper case-lower case is used 
to glue sequences of proper names together (e.g. VerenigdeStaten-van^Amerika 
(United States of America)). In this way we ensure that proper names are treated 
as a unit and term pairs are not extracted from them. 

An NP-parser is used to identify NPs in the texts. The parser we use was 
developed by TNO-TPD, (cf. van Surksum and den Besten (1993)). This parser 
is deterministic and requires fully disambiguated input from the tagger. It is also 
robust and fast (244 sentences per second on a Sun-Spare 10/40). The coverage of 
the NP-grammar is not complete 7 , but for the purpose of our experiment it was 
considered to be sufficient. 

Since the parser is deterministic and only generates one analysis for ambiguous 
structures, separate pair extraction modules extract the appropriate word pairs 
and single words from the output of the parser. 

A stop word list is used to identify and eliminate so-called stop words (mostly 
function words). 

Finally, all remaining words (and compound constituents) are replaced by their 
stem using a dictionary-based (CELEX) stemming algorithm. We used the best 
variant of all the stemming algorithms tested in previous UPLIFT experiments (cf. 
Kraaij and Pohlmann (1996)). This variant handles inflection only. Figure 3 shows 
how the different sub-routines work together. 



3 System variants 

We developed and tested a large number of system variants (23). These variants 
are summarized below. The names are abbreviations of the type vABC which 
must be interpreted as follows: A refers to the syntactic context from which of the 
head-modifier pairs are generated, B to the strategy used for the disambiguation of 
complex structures and C to the treatment of constituents of complex structures. 

vXXX No compound analysis but tagging, proper name identification and stem- 
ming are included. We added this version to see whether tagging, stemming 
and proper name recognition alone would already be sufficient to improve 
precision. 

vM.. Head-modifier pairs are generated from compounds. 

7 Relative clauses, for instance, are not included. 
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vS.. Head-modifier pairs from complex NPs with specific PP post-modifiers (see 
section 1 above) are added. 

vP.. Head-modifier pairs from all PP post-modifiers are added. 
vA.. Head-modifier pairs from adjective pre-modifiers are added. 
vAP.. A combination of the two previous versions, 
v.a. Complex terms are analyzed using the default strategies, 
v.c. Complex terms are analyzed using corpus data. 

v..l All constituents of complex terms are also added separately to the index. 

v..2 Only heads (including heads of complex modifiers) are added to the index 
separately. 

v..3 Only the head of the entire complex construction is added as a separate index 
term. 

v..4 Constituents are not added separately. 

We compared these variants with the following two versions: 

vn Baseline. TRU retrieval engine, no extensions. 

c4fow Best version from previous experiments, all constituents of compounds are 
added to the query and new compounds are generated by arbitrarily combin- 
ing query terms. 

4 Test procedures 

The test collection used for the experiments was compiled during previous research 
in the UPLIFT project on stemming algorithms. It consists of a document collec- 
tion of 59,608 articles published in Het Eindhovens Dagblad , Het Brabants Dagblad 
and Het Nieuwsblad from January to October 1994 and 36 queries and relevance 
judgements. Some general statistics for the document collection are given in table 
1 below. 



Total number of documents 59,608 

Total number of words (tokens) 26,585,168 

Total number of terms (types) 434,552 

Max number of words per document 5,979 

Av. number of words per document 446 

Max number of terms per document 2,291 

Av. number of terms per document 176 



Table 1: Document collection statistics 



